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Abstract 


In the present thesis, coarse-grained molecular dynamics simulations and 
relevant continuum models are investigated. In the first part we present a com- 
putational analysis on the driven transport of Maltose Binding Protein (MBP) 
across nano-channels in the framework of coarse-grained modeling. The work 
is motivated by recent experiments on voltage driven transport of MBP across 
nanopores exploring the influence of denaturation on translocation pathways. 
Our simplified approach allows a statistical mechanics analysis of the transport 
phenomenology which is useful to investigate the influence of denaturation from 
a structural point of view. Numerical results are qualitatively in agreement with 
experimental data. They reproduce a rich phenomenology, from fast transport 
to long blockades (Fig. 1) and, to some extent, bumping events. Specifically, we 
identify and characterize short and long channel blockades, associated with the 
translocation of denaturated and folded MBP structures, respectively. We show 
that long blockades are related to stall events where MBP undergoes specific and 
reproducible structural rearrangements. To clarify the origin of the stalls, the 


a Stall 1 " ? 
@ Mean residence time mi op 


Stall 2 


Figure 1: Stall snapshots during traslocation of folded MBP, stastistics of residence time 
and native contact map. 


stick-and-slip translocation is compared with mechanical unfolding pathways 
obtained via steered molecular dynamics. This comparison clearly shows that 
translocation pathway differs significantly from free-space unfolding dynamics 
and strongly suggests that stalling events are preferentially determined by MBP 
regions with higher density of long-range native interactions. This result might 
constitute a possible criterion to predict a-priori statistical features of protein 
translocation from structural analysis. 
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Customarily, protein dynamics is summarized in low-dimensional models 
where only the evolution of few order parameters is considered as an appropri- 
ate description of the complete dynamics of the atomic cluster constituting the 
molecule. We show that long stall events and main translocation features are es- 
sentially maintained in the 1D description of the process selected in the present 
work, where the evolution of an appropriate collective variable and the relevant 
free energy landscape are considered. Moreover, we show that both free energy 
barriers and translocation bottlenecks are strongly associated with the structural 
properties of the folded proteins. In particular, free-energy ramps systemati- 
cally correspond to the regions along the backbone chain with greater density of 
long-range native contacts with the untranslocated portion of the protein itself. 
These areas are also responsible for the stalls during nanopore transport. Also, 
this knowledge allow us to decompose the free energy trends into the different 
contributions of internal energy and entropic effect, pointing out the dominant 
terms. Here, results relevant to two different proteins and to a mutant one are 
considered and discussed. 

In the second part of the thesis, we enrich the standard 1D continuum view 
by proposing a continuum model for the description of the dynamics of isolated 
macromolecules. We adopt a second-rank tensor as a descriptor of the macro- 
molecular shape and identify the action governing its dynamics by means of an 
identification procedure from a discrete scheme, based on power equivalence 
and Cauchy-Born rule. We compare molecular dynamics stretching simulations 
with the continuum model by starting from discrete toy schemes, going on with 
increasing complexity, and ending with the analysis of the Ubiquitin protein. The 
results indicate limitations in the approach in case of unconstrained molecular 
dynamics while they show appropriateness for driven dynamics (Fig. 2). Also 
MBP is considered by addressing primarly translocation simulations, a particu- 
lar constrained dynamics where the selected approach confirms its appropriate- 


Figure 2: Snapshots of molecule Ubiquitin and relevant morphological descriptor during 
a mechanical pulling simulation. 


ness. 
Finally, to overcome the shortcomings stemming from possible lack of ho- 
mogeneous deformation Cauchy-Born rule is based on, we investigate some al- 
ternative heuristic approaches, which appear adequate in describing the protein 
shape evolution also in uncostrained dynamics, such as free equilibrations. 
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Chapter 1 
Introduction 


1. Preamble 


The present dissertation is supervised by Prof. Paolo Maria Mariano of Uni- 
versity of Florence and Scuola Normale Superiore of Pisa. 

It is developed in collaboration with Prof. Carlo Massimo Casciola and Dr. 
Mauro Chinappi of Sapienza University of Rome and Dr. Fabio Cecconi, Insti- 
tute of Complex Systems, National Research Council of Rome. 

The focus of the work is on the transport of proteins across nanopores, a 
process called translocation. The work is based on coarse-grained molecular dy- 
namics simulations exploiting the so-called Go-like model, [1, 2]. Equilibration, 
mechanical stretching and transport simulations are developed. 

A standard 1D continuum approach based on the Langevin equation in the 
potential of the mean force relevant to a suitably-defined reaction coordinate is 
investigated. The main aim is to establish its suitability to retain and interpret 
the complex three-dimensional molecular dynamics phenomenology. 

Non-conventional continuum modeling is also considered. In this perspec- 
tive a second rank tensor is defined in order to summarize the three-dimensional 
shape evolution of clusters of atoms in space, in turn undergoing several differ- 
ent dynamics. The analysis is performed in light of the mechanics of complex 
materials [3]. 


2. Translocation in biology 


Translocation, namely the passage of molecules through biological pores, 
is a fundamental process occurring in many natural phenomena, [4]. It 
involves the exchange of proteins, ions, energy sources, genetic information 
and any particle or aggregate that plays a role in cell functioning and living. In 
particular, biopolymer translocation across membranes is ubiquitous in biology 
and includes the passage of RNA strands inside nuclear pores [5], absorption of 
oligonucleotides on suitable sites [6], transport of proteins back and forth from 
the cells [7], mitochondrial import, protein degradation by ATP-dependent 
proteases and protein synthesis, [8, 9]. 
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Cells are filled with all kinds of nanopores, committed to an overwhelming 
range of functions. Proteins hold an important role in transport processes across 
cell membranes. Indeed the latter are generally composed of phospholipid bilay- 
ers and the channels across them by transmembrane proteins. The transport can 
be either passive or active. The former one can take place by diffusion or by in- 
terposition of carriers to which the transported substances bonded. In the active 
case instead, transmembrane proteins use the Adenosine-Tri-Phosphate chemi- 
cal energy to drive a molecule against its concentration gradient. A mechanism 
of selection, based on the interaction between signal recognition particles and 
relative receptors, allows the transmembrane proteins to detect the system to be 
transported [7]. 

As it will be described in Sec. 3, the discovery of the protein nature of 
transmembrane pores fostered the synthesis of conductive channels to perform 
in vitro translocation experiments. Although the first pioneering experimental 
work on a transmembrane biopore embedded in an artificial lipid bilayer, both 
housed in a microfluidic cell, dates back to the 90’s [10], clarifying the physical 
and chemical phenomena involved in nanopore transport will remain a fertile 
field of research for the years to come, as it is only recently that scientists have 
developed suitable techniques to adequately tackle this issue. 

Proteins are not only the basic constituent of transmembrane channels, but 
are also commonly transported between organelles and cells. Very often mem- 
brane apertures are not large enough to let the proteins enter in their native 
(folded) conformations. For example, the narrowest constriction in the pro- 
teosome organelle is only 13 A in diameter. It is thus necessary for the pro- 
teins to unfold in advance or during their passage across the pore [11]. Such 
co-translocation unfolding rate is orders of magnitude faster than chemical or 
thermal denaturation, suggesting a different unfolding mechanism [9, 12]. A 
natural and appealing explanation of this gap is to suppose the process to be 
driven by electromechanical forces. 


2.1. Protein molecules 


Proteins are complex polymers composed of smaller elementary molecules 
called amino acids. The specific monomer sequence determines their three- 
dimensional structure, which is in turn responsible for the biological function 
of the protein itself. Proteins influence a large number of phenomena in life sci- 
ence. For instance, some of them are involved in the transmission of signals 
between cells and tissues (hormones), other ones support the immune system 
against pathogens (antibodies); moreover enzymes cause or foster the onset of 
chemical reactions otherwise impossible at physiological conditions. In addi- 
tion, proteins are responsible for the regulation of gene expression by bonding 
with specific nucleic acid targets. Structural proteins are fundamental in con- 
verting chemical energy into mechanical work in muscles, besides constituting 
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part of the scaffold of tissues such as hair, nails, tendons and bones. Finally, par- 
ticular proteins store and transport particles, control the passage of molecules 
across membranes and drive the electron current in plant photosynthesis. Pro- 
teins can be subdivided in two main categories: 


1. globular proteins, 
2. fibrous proteins. 


The former are the majority, are soluble and their three-dimensional structure 
is almost spherical. The latter are insoluble and do not have a predetermined 
shape, although often are rod-like. 

The conformation that allows a protein to comply with its biological task is 
said native configuration, or also folded, natured or compact state. There are sev- 
eral techniques to achieve protein denaturation. Unfolded states can be obtained, 
for example, either thermally, chemically or even mechanically, as in atomic force 
microscope experiments or by using optical tweezers. 

In nature there exist 20 different standard amino acids that constitute, as 
already mentioned, the backbone chain of the protein molecules. Protein size 
ranges from 50 up to more than 3000 amino acids. However, the latter can bond 
one another forming also small peptides or polypeptides with no specific three- 
dimensional shape. Nineteen amino acids have the same general structure (de- 
picted in Fig. 1.1) and differ only in the chemical composition of the side chain R 
(residue chain). The side chain of the 20th (prolyne) is standard, but it is bonded 
to a nitrogen atom instead than to a carbon atom. The latter is called a-carbon 
atom. Other carbon atoms along the side chain are termed 8, y, ô, ... accordingly 
to their position with respect to the first one. Very often amino acids are simply 
called residues in the biological parlance. 


Basic Amino Acid Structures 


ATR C-OH OH 


Amino ik Carboxylic Acid 
Group Group 
R - carbon 
Side C sa 


Figure 1.1: General representation of amino acids. Main components are highlighted and 


labeled. 


3. Voltage-driven experiments and nanopore technology 


The basic apparatus to perform voltage-driven translocation experiments 
is constituted by an electro-chemical micro-fluidic cell, as the one depicted in 
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Fig. 1.2 panel a, which represents the evolution of the Coulter resistive counter 
[13]. In such a device two chambers (cis and trans), which contain a buffer so- 


Figure 1.2: Snapshots of panels d and c are obtained using VMD software [14]. Panel 
a: Sketch of a microfluidic experimental apparatus for voltage-driven translocation ex- 
periments. A magnification of the sensing area of the device (nanopore) is also provided. 
Panel b: Standard ion current blockade event at the passage of a molecule across the chan- 
nel. Panel c: Side view of the a-hemolysin transmembrane protein. Panel d: Character- 
istic eptametric structure of the a-hemolysin. This toxin is in fact composed of seven 
elementary proteins arranged in a circular fashion. 


lution, are connected by a pipe that also hosts the sensing area, i.e. a membrane 
where a nanopore is inserted (single nanopore systems). An applied voltage, usu- 
ally of the order of some mV, generates an ionic current through the channel 
that can be recorded by standard electro-physiological techniques like patch- 
clamp [15]. Once biopolymers (with a net charge at the pH of the solution) 
are introduced (on the cis-side), they cross the pore driven by thermal fluctu- 
ations (in the approaching to the pore) and by the voltage (across the pore itself, 
where the majority of the voltage drop takes place) and flow into the opposite 
chamber (trans-side). The passage temporary clogs the channel and provokes 
a detectable ion current drop, Fig. 1.2 panel b, which strongly depends on the 
chemical and physical properties of the molecule that occupies the pore. For this 
reason, nanopore systems can work as efficient devices to operate at the molec- 
ular level in order to characterize several different biological components [16]. 
Nanopore-based technology is believed a promising resource offering powerful 
tools for detection [17], manipulation and sequencing of macromolecules [18], 
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although meaningful data analysis still remains challenging [19]. Moreover, the 
duration of the current drop is a direct measure of the translocation time, which 
is in turn used to characterize a single transport event, along with the intensity 
of the blockage itself. Clusters of events, usually coupled with changes in ex- 
ternal conditions such as temperature or salt concentration, can as well provide 
information about the composition of the solution. 

In voltage-driven translocation experiments, a widely exploited transmem- 
brane protein is the well-known a-hemolysin (@HL), a 6-barrel toxin expressed 
by the the very common bacterium Staphylococcus Aureus [10]. @HL is very 
stable in-vitro and spontaneously migrates into lipid bilayers, considerably sim- 
plifying the building up of the experimental apparatus. Therefore it has been em- 
ployed in several cases both for DNA/RNA strands and for protein translocation 
[10, 16, 20-24]. Its crystallographic structure was determined with a resolution 
of 1.9 A [25] and a schematic view is given in panels c and d of Fig. 1.2. 

Another biological nanopore that worth to be mentioned for similar appli- 
cations is the Aerolysin channel [26], recently used by Pastoriza-Gallego et al. 
to study the translocation of a macromolecule called Maltose Binding protein 
(27, 28]. 

Other pioneering experimental studies concerned detecting polypeptides 
not only in the pore lumen but also in the aqueous phase. This analysis has been 
achieved for the first time by Movileanu and co-workers [29, 30], who were able 
to reveal the presence of a particular protein in the proximity of the biopore 
by attaching a flexible linker within the large cavity of the a-hemolysin pore 
with a specific ligand at its end. Inspired by this work, Kong and Muthukumar 
performed Langevin molecular dynamics simulations and Poisson-Nerst-Plank 
calculations to model the fluctuations of the linker inside the vestibule [31], eluci- 
dating the dynamics of the single-molecule captures in the aqueous phase. Even 
if several other experiments worth to be referred, it is evident that the fundamen- 
tal analyses just mentioned open the way to theoretical modeling and numerical 
investigation of an enormous amount of phenomena. Moreover, as Movileanu 
delineates [24], nanopores represent single-molecule probes with several pos- 
sible applications. They can be employed to reveal many features of important 
biomolecules like folding state, backbone flexibility, mechanical stability, bind- 
ing affinity and charge distribution, in addition to the possibility of revealing the 
biophysical rules that govern the transport through transmembrane channels. 
However, the main idea that fostered the development of voltage-driven translo- 
cation experiments, since the very first application, has always been the one to 
identify and fast-sequence the molecule that is clogging the throat section of the 
channel. Although the original approach concerned only DNA/RNA strands, 
protein sequencing is currently deemed even more interesting than DNA char- 
acterization. Indeed, whereas several fast and inexpensive techniques to perform 
DNA read-out already exist, the achievement of information on protein struc- 
tures and amino acid order through standard experimental approaches such as x- 
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ray crystallography or nucleic magnetic resonance is still challenging and time- 
consuming. However protein sequencing is even more complex than DNA for 
several reasons. The latter in fact is linear and homogeneously charged, differ- 
ently from a structured polymer. Also, the ability to discern between different el- 
ementary molecules has to be superior for protein characterization. Indeed DNA 
or RNA are constituted by only five different types of oligonucleotides while the 
basic constituents of protein molecules are the 20 standard amino acids. 

Several groups have already investigated the interactions of small polypep- 
tides, including a-helical [32-34] and 6-barrel [35] hairpin polypeptides, or 
even whole macromolecules [23, 36, 37] with the @HL or anthrax pores. A 
crucial step toward single-molecule differentiation and sequencing has been re- 
cently carried out by Talaga and Li [38]. They have distinguished from each 
other different proteins in solution by modeling the characteristic ion current 
drop across a solid state nanopore through a simple Ohmic term, multiplied by 
a corrective factor that takes into account the relative geometry of the protein 
and the channel. This approach will be probably delved deeper in the near fu- 
ture to take into account other physical and chemical aspects of the molecules 
and of the whole system. 

As already stated, first experiences were carried out by exploiting biological 
pores but, as mentioned in the last example, recent advances in technology have 
opened new possibilities toward the use of artificial solid state nanopores. 

A great advantage of protein biopores is that they can be chemically engi- 
neered with advanced molecular biology techniques, such as mutagenesis [39]. 
Using this approach a wide variety of a-hemolysin biosensors were developed, 
especially to probe binding affinities [30, 37, 40] and to reach discrimination at 
single-base resolution on isolated strands of DNA [41, 42]. Moreover, the ge- 
ometry of a biopores is well-known and is highly reproducible, that is, once it 
has been determined by using, for example, x-ray crystallography, it can be as- 
sumed to be the same for all the biochannels of that particular type. Finally, 
toxin pores like aHL or anthrax [23, 36, 43] embed spontaneously in lipid bi- 
layer membranes. However they exhibit a number of disadvantages too. Typi- 
cally the pores, but mainly the membranes that host them, can become unstable 
if changes are produced in the external environment in terms of temperature, 
pH or denaturant concentration (used to unfold the proteins in solution). Also, 
it is very difficult to imagine portable devices which sensing area is made of bi- 
ological pores, due to instability of, above all, the lipid bilayer. In addition, their 
fixed size can also represent a limit, because geometry and dimensions cannot 
be macroscopically changed to address specific issues. 

Solid state nanopores present some advantages over their counterparts, such 
as increased reliability and lifetime, control over surface properties and pore ge- 
ometry (diameter, length and shape), compatibility with existing semiconduc- 
tor and microfluidic fabrication techniques with thus potential integration into 
ultra-high throughput devices [44, 45]. In fact, nanopore-based DNA analysis 
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Figure 1.3: Transmission electron microscope images of a solid state nanopore. Adapted 
from [44]. 


has the potential to carry out a range of laboratory and medical experiments 
even faster than current methods, also reducing the cost (fast DNA sequencing 
is an application of nanopore based sensing that took about 25 years to transfer 
the early insights into a commercial device, announced by the end of 2012 [46]). 
Figure 1.3 shows TEM images of silicon nanopores. The first synthetic biosen- 
sor was developed by Charles Martin’s group in 2005 and consisted of a single 
conical gold nanotube [47]. Since then, several teams have pursued this route, 
demonstrating the capability of solid state nanopores to perform single-molecule 
detection and to probe several characteristics of the proteins [38, 48-52]. 

Finally, as speculated by Branton et al. [53], the possible embedding of a 
biological pore into a semiconductor membrane represents a fascinating oppor- 
tunity that should be carefully considered and evaluated. 


4. Outline of the thesis 


The thesis is organized as it follows. In chapter 2 an overview on the state 
of the art of voltage-driven translocation experiments on proteins is provided, 
along with a summary of numerical investigations concerning coarse grained 
molecular dynamics. Specifically, the work by Oukhaled et. al [23] is described, 
since it is the experimental analysis that has motivated a substantial part of the 
numerical simulations performed during the present research activity. Also, nu- 
merical analyses [54, 55] are outlined for can be deemed the reference works 
on the topic available at the beginning of the thesis. Chapter 3 shows molecu- 
lar dynamics results on protein stretching and translocation [56]. Chapter 4 is 
dedicated to standard one-dimensional modeling of the translocation process, 
in terms of driven diffusion of a proper reaction coordinate in the associated po- 
tential of the mean force [57]. Main theoretical aspects are also considered along 
with the exposition of the numerical results obtained. Finally, the last chapter 
contains the three-dimensional continuum analysis performed to enrich the just 
mentioned standard view. The analysis is performed in light of the mechanics 
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of complex materials [3] and it is in particular inspired by two previous works 
[58, 59]. 
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Chapter 2 
State of the Art 


1. Preamble 


This chapter principally reviews the main experimental work that consti- 
tutes the baseline of our research activity, to which the whole section 2 is dedi- 
cated. 

However, a more general description of both experimental and numerical 
investigations is provided in section 3, where also a popular coarse grained 
model somewhat similar to the Go model is outlined. It is worth to anticipate 
that the Go model, sec. 2.1, constitutes the framework selected for the numerical 
analyses presented in this thesis. Coarse grained models considerably reduce 
the computational time with respect to full atoms simulations, which are not in 
turn considered in the present setting since typical translocation time scales 
are yet not accessible, unless the pulling force is order of magnitudes greater 
than the biological ones [60-62]. In addition, the extrapolation of these results 
to the lower-force regime is a very difficult task. Moreover, the most recent 
and remarkable studies refer primarily to the analysis of folding/unfolding 
pathways [63-65]. The advantage of a reduced complexity with respect to 
full-atomistic techniques relies on the possibility of massive sampling of events, 
thus allowing a statistical mechanical description of translocation in terms of 
ensemble averages, as it will be shown in chapter 3. 

Finally, the last section summarizes two numerical investigations of protein 
transport across nanopores that can be considered as the reference works in the 
minimalist molecular dynamics setting for protein translocation available in the 
literature at the beginning of the research activity. 


2. Experimental work: a selected example from the literature 


Stochastic biosensing has been exploited for detection of not only small 
molecules or DNA strands but also to study long and complex biopolymers. 
Since our interest is focused on protein experiments and modeling, all the wide 
literature on oligonucleotides, although extremely interesting, is not further 
considered. 


Marco Bacci, Coarse-grained molecular dynamics and continuum models for the transport of protein molecules 
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The transport of a folded polypeptide is characterized by a greater energetic 
barrier than for an unfolded structure. According with this argument, a recent 
study combined single-molecule electrical measures with simulations based on 
Langevin dynamics to show that highly unfolded 6-hairpin polypeptides enter 
the pore in a linear fashion, pursuing fast single-file translocations. In contrast 
the passage of structured 8-hairpin polypeptides occurs more slowly, producing 
longer current blockades [35]. 

Similarly, Oukhaled and co-workers [23] studied the translocation of folded, 
partially folded and unfolded structures of the Maltose Binding protein’ (MBP) 
across the wHL channel (Fig. 2.1). The unfolded structures were accomplished in 
high concentrations of guanidinium hydrocloride, a chemical denaturant com- 
monly exploited in protein folding analyses. It is worth to disclose that this ex- 
perimental work constitutes the starting point for the numerical investigations 
presented here. In this experiment, translocation has been studied by varying 
the denaturant concentration in the buffer solution, highlighting that current 
blockades were dependent on this parameter, both in terms of frequency and 
duration. The authors have observed, among other things, that folded proteins 
did not translocate across the biopore. The radius of gyration of the native MBP is 
indeed greater than the aHL lumen and the driving force was not strong enough 
to accomplish the protein unfolding at the pore mouth (the voltage required to 
unzip the fully native MBP would have destroyed the lipid bilayer of the exper- 
imental apparatus). Therefore no ion current drops were detected for low levels 
of denaturant in solution, panels a and c of Fig 2.1. With the increasing of the 
guanidinium hydrocloride concentration, the frequency of translocation events 
increases, following a step-like denaturation curve. Also, the average duration of 
long blockade events reduces, up to reach the phenomenology depicted in panel 
f where practically all the MBP molecules are unfolded in the bulk and accom- 
plish translocation in a very short time ~ 1ms. In between these states (fully 
native and fully denatured), complex phenomenologies occur: short and long 
current blockades coexist, testifying the presence of partially folded structures 
in solution, the characteristic transport time of which is order of magnitudes 
greater than the one of unfolded polymers, panels d and e. The mechanical na- 
ture of such bottlenecks of translocation has been deduced since stalls shortened 
in time when the applied voltage was increased. Moreover, the average translo- 
cation time decreases with the increasing of the denaturant concentration. The 
latter evidence excludes the possibility that the MBP stuck along the pore, due to 
chemical bonds, a phenomenon that could in principle generate long blockades 
but that is also thought to increase if the denaturation degree of the molecules 


lIn nature, MBP is responsible for the uptake and efficient catabolism of maltodextrins in the 
well-known bacterium Escherichia coli, increasing the solubility of recombinant proteins when ex- 
pressed as MBP-fusion proteins. It is a monomeric periplasmic globular protein with two lobes (the 
C and N lobes) that close to form the binding site. It is a single chain of 370 amino acids with no 
disulphide bonds. Its molecular weight is approximately 42.5 ku. 
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Figure 2.1: Adapted from [23]. Left side: Snapshots of the system and measurements. 
Panel a: Schematic representation of Maltose Binding protein and a-hemolysin without 
Gdm-HCl in solution. The current is constant at 100pA (open pore conductance) and the 
MBP can not enter into the pore. Panel b: With [Gdm - HCl] = 1.35M, well above the 
unfolding transition, [0.8M : 1.0M], the current trace decreases down to 20pA when 
an MBP molecule is in the channel. Right side: Part of current traces generated by the 
passage of MBP through the a-hemolysin pore as a function of the Gdm-HCl concentra- 
tion. Panel c: At [Gdm - HCl] = [0.8M] MBP practically does not translocate across 
the channel and no current drops are detected. Only few short blockades are observed, 
their average duration is 0.2ms Panel d: For [Gdm - HCl] = 0.85M, long blockades 
of 100s in average alternate with a series of short ones of 0.2ms in average. Panel e: At 
[Gdm - HCl] = 0.9M, the frequency of the short blockades increases (of average du- 
ration 0.2ms). The frequency of the long blockades also increases, their distribution is 
wide, their mean duration decreases, its value based on the statistical analysis of all events 
is 40ms. Panel f: For [Gdm — HCl] = 1M, only short blockades (of average duration 
0.5 ms) are observed, MBP molecules are almost completely unfolded. The frequency of 
short events does not further increase. 


increases, for the hydrophobic chains of the residues can more easily interact 
with the pore walls. 

As already stated, translocation of Maltose Binding protein through the a- 
hemolysin pore is the baseline for the research activity performed. Numerical 
investigations have the primary aim to explore the complex phenomenology just 
described in order to point out the role of denaturation on the transport across 
nanochannels. 
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3. Additional past work outline 


The aim of the current section is to touch on additional numerical and ex- 
perimental works that played a remarkable role during the research activity. 

From an experimental perspective, voltage-driven translocation of 
polypeptides is more problematic than for DNA or RNA strands so that 
numerical and theoretical support is extremely important. The main difficulties 
arise from the fact that proteins, differently from nucleic acids, are not 
uniformly charged molecules and, in addition, their natural tendency to form 
compact conformations or aggregates usually interferes with the passage 
through narrow paths, generating consequently a rich (almost overwhelming) 
collection of non-trivial phenomena [24, 28, 38, 51, 54, 56, 66] (Fig. 2.2). 
Despite the difficulties, several research groups have showed that voltage-driven 
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Figure 2.2: Conformations of Maltose Binding Protein (MBP) pulled across a a- 
hemolysin channel (aHL) computed via a reduced-model simulation. The pore is por- 
trayed as a cylinder. The protein description is reduced to its Ca-backbone. The chain 
needs to unfold to translocate from the cis to the trans-side. The image was made using 
VMD software [14]. 


translocation can be achieved, not only for nucleic acids, but also for different 
peptides and proteins [23, 34, 36, 38, 52, 67-69], and other studies show that 
this technique is able to yield substantial structural information [70], to 
characterize protein charge [71] and to discriminate different degrees of folding 
or aggregation [23]. However, except for some encouraging evidences that 
nanopores can detect site mutations on proteins [70], it is uncertain whether 
current blockages convey the necessary information to resolve each of the 20 
amino acids in order to achieve protein sequencing. 

However, the number of experimental efforts is quickly multiplying [24, 72] 
due to the obvious biological and biomedical advances expected in this research 
field [17, 18, 73]. Therefore from both a theoretical [74-79] and computational 
[11, 80-83] perspective, nanopore technology represents a challenging field to 
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develop methods able to interpret experimental results and to support the opti- 
mization and design of nanopore devices [74-79]. 

Sometimes simulations and theoretical methods allow bypassing the exper- 
imental limitations, such as restricted access to the finer details of the dynamics 
and irreproducibility of specific phenomena due to the instability of the organic 
matter out of physiological conditions. There are various theoretical strategies to 
tackle the transport of proteins across nanopores, starting from the microscopic 
description. Atomic-scale simulations are extremely informative about the pro- 
tein dynamics for they take into account the more thorough structural and in- 
teraction details of the whole system: protein, nanopore, lipid bilayer and ionic 
solution [84, 85]. The high system complexity, however, prevents atomic-scale 
simulations from cumulating the necessary number of events for a meaningful 
statistical mechanical description, even for small translocating molecules. Simu- 
lations via coarse grained models of protein transport across nanochannels con- 
stitute an interesting alternative to full-atom methods. As already mentioned, 
translocation is practically always coupled with unfolding of the biomolecule. 
The simplest model to account for such a coupled process is to assume that both 
translocation and unfolding are accomplished by a constant mechanical force. 
Even this approach is computationally challenging and not devoid of drawbacks, 
because biologically or experimentally relevant time-scales are orders of magni- 
tude greater than those reproducible by full atoms simulations and, in addition, 
the constant force assumption could not be realistic to describe the transloca- 
tion in cells. In fact, typical simulations time scales range from nanoseconds to 
microseconds (whereas experimentally measured translocation events are of the 
order of milliseconds). Also, it has been found that repetitive pulling actually 
catalyzes protein import into mitochondria, suggesting that this could even be 
a strategy selected by evolution to import proteins into organelles [86]. Specifi- 
cally, as suggested in [86], the interplay among cyclic operation of molecular mo- 
tors, force-induced backbone strain, and non-native interactions might be of es- 
sential importance in mitochondrial import and not only, as pulling/pushing of 
proteins by molecular motors is widely observed in the delivery of biomolecules 
across several organelle membranes. 

A simplified phenomenological description of nanopores, proteins and 
interactions, allocating a reasonable amount of computational resources, 
provides an immediate inference on the average pathway of the process. 
Moreover, the limited amount of CPU loading allows a massive statistics of 
translocation events to be collected. Clearly, the major limitation of coarse 
grained models lies just in their phenomenological nature, because the 
non-trivial simplifications introduced have to be compensated by an extensive 
input of experimental information in order to improve the predictive power of 
the models and their ability to match experiments. Likely, a suitable strategy 
to investigate theoretically a complex and articulated phenomenology, as the 
one emerging from the fast-developing nanopore technology, would require 
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an integrated approach that cleverly combines coarse-grained modeling with 
full-atom simulations. In this direction the most famous example of a coarse 
grained molecular model that directly derives from full-atom force fields (and 
therefore is much more complex that the approach here actually used) is the 
Martini model [87, 88]. 

A burgeoning development of coarse-grained approaches to heteropolymer 
modeling has taken place in recent years [89]. The basic feature of protein mini- 
malist models is to group together and identify side-chain atoms, or even a com- 
plete residue, into simpler units (virtual atoms, beads). These models are sim- 
ple, fast to implement and require relatively small computational resources as 
they take into account only the C,,-carbon backbone dynamics. In this context, 
the Sorenson Head-Gordon (SHG) model [90] is an off-lattice model that gen- 
eralizes a previous model introduced by Honeycutt and Thirumalai [91]. The 
a-carbons are represented by three possible types of beads: hydrophobic (B), 
hydrophilic (L), and neutral (N). The force responsible for the collapse onto a 
compact structure is the attraction between B-beads, whereas all other pairs of 
interactions are repulsive and determine the rearrangement of the folded struc- 
ture onto the native topology. The long-range interactions between far apart 
residues in the sequence are modeled through the potential: 


m 12 6 
Vir= Y eSi (=) -25(2) (2.1) 
i,j>i+3 Tij Tij 


where the summation ranges from 1 to the number m of residues, e sets the 
energy scale and ø is the excluded volume size used to obtain a self-avoiding 
polymer chain. Also, r;; = r;—r; is the distance between residues (beads) i and j. 
The attractive forces between hydrophobic beads is attained by setting S1 = S2 = 
1 for BB pairs. The others are all repulsive: LL, LB are characterized by S = 1/3 
and S> = —1 and NB, NL by Sı = 1 and Sə = 0. In order to achieve the proper 
secondary structure, bending and dihedral interactions, which surrogate side- 
chain packing and hydrogen-bonding, are introduced. The analytic expression 
of the dihedral potential is 


Vain = Se A + cos ġ;) + Bi(1 - cos ¢;) 
+C;(1 + cos 3¢;) + Di(1 + cos(¢; + 7/4))], (2.2) 


where @; indicates the angle between the two adjacent planes identified by the 
positions of four consecutive beads. The information on secondary structures 
is stored in the coefficients A, B, C, and D that determine a bias on the angles 
reflecting the tendency of residues to form one of the secondary motives: helical 
(H), extended (E), turn (T). Therefore, the primary structure must be comple- 
mented with the auxiliary sequence of E, H or T symbols, assigning the appropri- 
ate set of coefficients. Such a decoupling between primary and dihedral sequence 
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allows a fine structural design [92]. The bending angle interaction is modeled as 
a harmonic potential: 


m—-2 


k 
Viend = >, 3 (i-o), (2.3) 


i=l 


with a constant kg = 20e?/(rad) so that large deviations from the equilibrium 
value ĝo = 104.85° are unlikely. The SHG force field is completed by stiff springs 
with equilibrium distance do = 3.8 A, namely 


m-1 


k 
Vpept ("i i+1) = >, g (Tinie - do)’, (2.4) 


i=1 


used to maintain chain connectivity by simulating the presence of covalent pep- 
tide bonds between successive amino acids. In the SHG model a selected num- 
ber of elements is used to capture the essential characteristics of the proteins. 
However, some features, such as hydrogen bonding and side chains, are missing. 
These limitations should be compensated through a design strategy for optimiz- 
ing the sequence [93]. SHG models have been adopted variously in studies on 
protein unfolding (in free and nano-confined geometries [54]) and transloca- 
tion [66]. In the next chapter the coarse grained model (G6 model) used to per- 
form the research activity here reported is considered, along with all the other 
assumptions employed. A critical comparison between the performance of the 
Go and SHG models in reproducing the folding of the small WW-domain form 
in the HPinl protein can be found in [94]. 

Huang, Kirmizialtin and Makarov [54], using a SHG force field, compare the 
mechanical unfolding of Ubiquitin in free space (AFM?-like stretching) and in 
a semi-infinite pore, and show that the two dynamics are quite different. More- 
over, the authors observe that the unfolding pathway in the pore is strongly de- 
pendent on both pore diameter and pulling terminus. The free energy profile as 
a function of the position of the pulled amino acid along the pore axis presents 
several branches corresponding to ‘unfolding intermediates’ before a plateau is 
reached once the whole protein has entered the pore. As we shall see, similar 
intermediates are also observed as ‘stall events’ in the present research activity in 
a finite-size pore [56]. Due to its structural simplicity, Ubiquitin was used also in 
other studies, for instance Ammentiet al. [55] simulated Ubiquitin translocation 
across a long pore using the Go force field. The latter two examples [54, 55] are 
further considered in the next section as are deemed to constitute the reference 
coarse grained computational studies available at the beginning of the research 
activity. A brief introduction is provided to delineate the basic mathematical 
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frameworks that are still missing to perform an albeit qualitative description of 
these works. 


4. Analytical models and simulations of protein translocation 
4.1. Overview 


A common analytical approach to model complex phenomena such as the 
translocation process, is to investigate the low-dimensional potential of the mean 
force landscape, which is relevant to suitably-defined reaction coordinates. This 
approach is often coupled with simulations where simplified models for the pro- 
tein and its surroundings are exploited, both to investigate the dynamics of the 
system and to reconstruct the free energy profile, which governs the trend of 
the potential of the mean force as it will be clear later on. As developed fur- 
ther in chapter 4, a reaction coordinate is just a collective order parameter that 
encompasses and summarizes multi-dimensional phenomena, usually between 
two end-states. The choice of the reaction coordinate is often guided just by intu- 
ition and aim of the analysis. In forced dynamics a natural selection is to choose 
the degree of freedom that couples with the driving force (usually it is the com- 
ponent along the force direction of the position vector of the pulled residue of 
the protein). For umbrella sampling applications [95] to translocation, however, 
the center of mass of the molecule is often the preferred parameter [55]. For 
mechanical stretching, a situation mimicking AFM pulling studies, the protein 
extension along the direction of the applied force is often implemented as a reac- 
tion coordinate. When the free energy landscape enters in the formulation of the 
potential of the mean force, which is the natural (and sole) option for canonical 
systems such as a single molecule in a heat reservoir, an implicit assumption is 
always done: the translocation can be presumed to take place in thermal equi- 
librium at any location along the pore, so to obtain a quasi-equilibrium process. 
It is however possible that a protein, while inside the tunnel, would be trapped 
in a metastable state so that its behavior is practically non-ergodic. The quasi- 
equilibrium assumption allows one to avoid dealing with the actual kinetics of 
the translocation, that can be traced back to a series of quasi-static states, as the 
ones obtained by sampling the system with the umbrella sampling method. In 
this context the free energy trend, Go(z), (here the reaction coordinate is as- 
sumed to be just one for simplicity and it is indicated by z without loss of gen- 
erality) is related to the probability p(z) to find the chosen reaction coordinate 
at z (a collection of ‘equivalent’ macro-states of the system correspond to such a 
value of the collective variable) by the following relation 


Go(z) = -kT Inp(z), (2.5) 
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where ky is the Boltzmann constant and T the temperature of the heat reservoir. 
When external fields act on the system, the associated reversible work W (z) has 
to be taken into account by including in the previous expression a proper term. 
For instance, in the case of the mechanical stretching induced by a constant force 
f, exemplified in the next paragraph, the potential of the mean force reads as 


Gs (z) = Go(z) + W(z) = -keT Inp(z) - fz (2.6) 


(and now it is evident the reason why the free energy profile governs the po- 
tential of the mean force trend). Since continuum approaches are delved deeper 
later on in the thesis, the topic is not further considered in the present section. 
It has had the only aim to introduce the idea underneath the 1D continuum ap- 
proach related to the already mentioned numerical investigations [54, 55]. Such 
a description is carried out in the next section. 


4.2. Coarse-grained molecular dynamics reference studies 


Huang, Kirmizialtin and Makarov [54] performed an analysis relevant 
to protein import into mitochondria using Langevin dynamics simulations 
(see (3.8b)) of a coarse-grained off-lattice model, the already introduced 
SGH model, to investigate the co-translocation unfolding of a protein 
structure (the 76 amino acids long Ubiquitin), pulled mechanically through a 
semi-infinite narrow pore, cylindrical in shape. It is emphasized that, despite 
the computational savings, due to the minimalist assumptions, biologically 
relevant time scales associated with barrier-crossing events are rarely accessible 
via direct simulations. In this work the potential of the mean force just 
introduced is exploited, i.e. equation (2.6), and the chosen reaction coordinate 
coincides with the z-axis coordinate of the pulled residue (the z-axis coincides 
in turn with the axis of symmetry of the pore). ‘Thus the translocation is 
assumed to be slow enough with respect to the protein internal dynamics to 
be described as a one-dimensional driven diffusion along the pore axis in the 
potential G;(z). To obtain the global shape of the free energy Go(z), the 
umbrella sampling/weighted histogram method [96] is used. 

In [54] the free energy profile resulting from translocation is also 
compared with the one stemming from mechanical stretching (AFM-like 
experiments) simulations and differences are discussed. It is in fact proven that 
significant differences exist between these two cases and that, in particular, 
AFM experiment outcomes cannot be easily scaled to match translocation 
results as each process occurs following distinct pathways. For details on 
amino acid interactions see section 3. Without entering the details of the 
calculations, Figure 2.3 panel a shows the profile of the potential of the mean 
force in the mechanical stretching case, for different values of the force (here 
the reaction coordinate is the molecule extension along the force direction). 
In this figure, contact maps are also depicted. These are the plots of the pairs 
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of residues (i,j) such that their distance, in both the native and current 
configurations, is lower than (in this analysis) 7.5A and 9.125A respectively. 
Also, |j — i| > 3. Such residues interact following the long range potential (2.1). 
In panel b the same situation is showed for the translocation process when the 
N-terminus is pulled inside the pore. In the article [54] the pulling from the 
C-terminus is also studied?. While the overall free energy cost of extending the 
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Figure 2.3: Adapted from [54]. Panel a: Free energy profile of Ubiquitin in units of en 
as a function of the mechanical stretching reaction coordinate for different values of the 
force. The minima correspond to the native-like state 0 and unfolding intermediates 1 and 
2, which structure is depicted in the lower part of the figure, where also contact maps are 
showed. Panel b: Translocation of Ubiquitin driven by a force applied to its N-terminus. 
The potential of the mean force is plotted as a function of the position of the reaction 
coordinate, corresponding to the just mentioned end of the molecule. Native-like 0 and 
intermediate structures 1-3 are shown with the relevant contact maps. 


protein is similar, the profiles of Gy(z) and, consequently, the force-induced 
unfolding mechanisms are different. The local minima correspond to unfolding 
intermediates that are obviously distinct. Certain similarities exist between 
mechanical stretching and translocation pathways, especially in the first step, 


>The N-terminus refers to the start of a protein or polypeptide terminated by an amino acid with 
a free amine group (-NH2). The convention for writing peptide sequences is to put the N-terminus 
on the left and write the sequence from N- to C-terminus. When the protein is translated from 
messenger RNA, it is created from N-terminus to C-terminus. The latter is in turn the end of an 
amino acid chain (protein or polypeptide), terminated by a free carboxyl group (-COOH) 
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transition from structure 0 (native-like) to structure 1, which is present in both 
cases and concerns the same event (separation of two terminal parallel strands). 

In this study the dependence of the translocation time on the driving force 
is also examined, in addition to pore-size effects. Actually only a crude estimate 
of the effect of the driving force on the blockage time is carried out, as the au- 
thors assume that the average translocation velocity should be correlated to the 
overall free energy barrier, namely AG(f)= max G f(z) -minG'y(z), encoun- 
tered along the reaction coordinate. The unfolding free energy barrier does not 
depend linearly on the force, as it is often conjectured in mechanical stretching 
studies. The pore investigation is not considered here for brevity; it is just worth 
mentioning that if the pore is wide enough, the protein can perform transloca- 


f=0.5f, 


S=15/, 


-20+ 


o 
_ 
> 
S 
> 
Yw 
© 
è 


2w 40 60 20 40 60 
i i 


Figure 2.4: Adapted from [54]. Translocation of Ubiquitin across a large pore. The reac- 
tion coordinate used to project the free energy profile is the position ofthe last amino acid 
of the chain along the pore, the N-terminus. Structures representative of the molecule as 
encountered during simulations are also depicted with their contact maps. 


tion in a semi-folded fashion, for example in the way showed in Figure 2.4. 

As stated by the authors themselves, several limitations may prevent the di- 
rect comparison of the numerical results with the experimental ones. These are 
subsequently listed: 


e The minimalist protein model may be too rough to reliably estimate the 
translocation free energy barriers and their force dependence. 


e The analysis concerns a semi-infinite pore, which is unrealistic. 
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e The constant force assumption might be inadequate, even for 
voltage-driven translocation experiments. 


e The assumption that translocation is a slow barrier crossing process may 
be unrealistic, especially in the case where the pore is wide and the result- 
ing free energy barrier is small. If the translocation time scale is of the 
same magnitude of internal dynamics time scales, the simple description 
of the one-dimensional driven diffusion process in the equilibrium po- 
tential of the mean force is no longer applicable and a full simulation of 
translocation dynamics may be required. 


e More realistic pore models may be called upon in order to apply compu- 
tational results to the natural phenomena that take place in cells. 


For completeness, it is worth mentioning a work where translocation of 
Ubiquitin is studied both in its thermodynamics and average kinetics [55]. In 
this analysis a pore of finite length is considered, thus tackling the issue, although 
it is long enough to accommodate the unfolded protein, i.e. if some residues 
still dwell on the cis-side, none has yet escaped the channel from the trans-side 
aperture. In [55] statistical analyses are also carried out, accounting for block- 
ade time distribution and translocation probability, by following the analytical 
method just sketched, in terms of first passage time statistics (again this per- 
spective will be delved deeper further on in the thesis, chapter 4 sec. 2.1). In 
the remaining part of this section the main points of [55] are summarized. The 
model employed for the protein is a Go-like one, see sec. 2.1 of chapter 3, while 
the pore confinement is again described by a cylindrical soft-core repulsive po- 
tential. The Langevin equation of motion still governs the simulation dynamics 
and umbrella sampling is performed by constraining the center of mass of the 
molecule with a harmonic potential. Multiple weighed histogram technique is 
use to deweight the umbrella sampling results [96]. 

Translocation simulations are run until the protein is fully expelled out of the 
right end of the channel and almost complete refolding takes place. However, 
at low forces, Ubiquitin may fail to cross the pore within the simulation time 
window. The probability of translocation, Pr,, is estimated as the number of 
translocation success over the total number of runs as a function of the driving 
force and of temperature. Figure 2.5, panel a, shows the curves of Pr, for two 
different values of the temperature, reporting a step-like trend, which delineates 
a clear value of the critical force, i.e the force for which Pr, = 1/2. For instance, 
the decreasing of the critical force with temperature is a tangible consequence of 
the lowering of the free energy translocation barriers with temperature increase. 
Panel b in the same figure contains the trend of the average translocation time 
as a function of the pulling regime and for the two values of the temperature 
simulated. Only in the high force regime the average translocation time shows 
an Arrhenius-like dependence on the force, i.e. T © To exp(-GFL), with 8 = 
1/(kyT'), F the pulling force and L the pore length. 
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Figure 2.5: Adapted from [55]. Panel a: Ubiquitin translocation probability as a function 
of the pulling force for two values of temperature. Pore length = 300 A, pore radius = 4 A, 
fu = 6 pN, Tref = 198°K and Tpn = 305°K. Panel b: Average translocation time 7 of 
Ubiquitin as a function of the force F at the two simulated temperatures. tu = 0.25 ps. 


Characteristic trends of the free energy (indicated as G(x)), obtained from 
umbrella sampling simulations and multiple weighted histogram analysis 
method, are reported in Figure 2.6. These values refer to the position of the 
center of mass of the protein along the pore axis. The plateau indicates that 
once the protein is unfolded and completely inside the channel, it can slide 
along its axis without further free energy increase. This is a consequence of 
the pore geometry, for it is long enough to fully host the protein even when 
completely unfolded. In contrast, as expected, the main variations of the free 
energy occur at pore boundaries. In the proximity of the right end, when 
actually some of the residues are already located outside the pore, the molecule 
starts to be spontaneously expelled. 
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Figure 2.6: Adapted from [55]. Free energy trend obtained from umbrella sampling sim- 
ulations with a harmonic potential as umbrella potential in units of e. The two curves 
refer to the two simulated temperatures (reference and physiological). 
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However, despite the insights gained, the general relationship between 


the structure of the proteins and their resistance to mechanically driven 
co-translocation unfolding remains poorly understood. 
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Chapter 3 
Molecular Dynamics Translocation Simulations 


1. Preamble 


A set of translocation simulations and the description of the relevant nu- 
merical tools are reported here. The attention is focused on driven transport 
of Maltose Binding protein (MBP) across nano-channels in the framework of 
coarse-grained modeling. 

As already mentioned, this work is motivated by a recent experiment on 
the MBP voltage-driven transport across nanopores which explores the influ- 
ence of denaturation on translocation pathways [23]. Specifically, short and long 
channel blockades, associated with the translocation of denaturated and partially 
folded MBP conformations respectively, are identified by large differences in the 
ion current drop durations. 

Dynamically, translocation of proteins strongly depends on the denatura- 
tion degree and generally is coupled with an unfolding stage. In fact a folded 
protein with gyration radius larger than the pore narrower section needs a com- 
plete or partial unfolding to start the translocation [38, 51]. 

Structurally, MBP is a monomeric globular protein with 370 residues, resolved 
through x-ray crystallography by Quiocho, Spurlino and Rodseth [97]. Mechan- 
ical AFM pulling experiments by Bertz and Rief [98] identified some structural 
regions - named unfoldons by the authors - as resistant areas to mechanical de- 
naturation. The domains M1, M2, M3 and M4, shown in Figure 3.1A on the PDB 
structure (PDB-ID: 4mbp), can be related (one can expect such a thing) to the 
long blockade events in the MBP translocation. In computations we consider for 
MBP the Go-like model described in section 2.1, as a natural approach to assess 
the impact of the molecule structural properties on translocation. The advan- 
tage of a coarse-grained description with respect to atomic scale models relies 
on the possibility to explore a large number of denaturation and pulling condi- 
tions, accumulating this way robust statistics of translocation events. For recent 
applications of coarse-grained models to various biochemical processes see the 
review [99]. 

As first step we show that the model is able to reproduce the general features of 
MBP chemical denaturation. Afterward we study the translocation process in 
a pore reproducing the average dimensions of the @HL channel. We find the 
translocation dynamics to be strongly affected by the protein denaturation state, 
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Figure 3.1: Adapted from [98]. A) Maltose Binding Protein structure with the four un- 
foldons highlighted. M1-darkest gray (residues: 296-370), M2-light gray (residues: 244- 
295), M3-dark gray (residues: 1-113), M4-lighter gray (residues: 114-243). Panel B reports 
the positions of the five stall-points found in our translocation runs (residues 7, 63, 114, 
267, 335) in a 2D-topological view. 


similarly to the experimental evidence. In particular, the translocation of chem- 
ically unfolded MBP conformations requires relatively low forces and once the 
pulled terminus enters the pore, the transport proceeds uniformly. In contrast, 
native-like structures exhibit a richer phenomenology: stronger forces are re- 
quired to trigger the transport that, once started, develops in a stick-and-slip fash- 
ion, through bottlenecks and jerky movements caused by the rearrangements of 
the folded part of the protein that has not yet engaged the pore. In this case 
the issue is to identify the MBP structural motives responsible for the translo- 
cation slowing down. First we exclude these stalling stages to be related with 
the unfoldons, by showing that translocation pathways significantly differ from 
the free space unfolding dynamics. Then, by means of an analysis of static and 
dynamic native contact maps, we show that the stall points of the translocation 
pathway are mainly due to the protein regions more dense in long range native 
interactions. This result might constitute a possible criterion to predict a-priori 
some statistical features of protein translocation from a simple static structural 
analysis. 


2. Three-dimensional models for MBP simulations 


In order to simulate numerically translocation phenomena, a coarse-grained 
computational model for the Maltose Binding protein and its surroundings is 
developed (Fig. 3.2). As it is described below, choices are made in terms of the 
importing mechanism, pore and particle interaction potentials, numerical inte- 
gration scheme and equation of motion for the material points constituting the 
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system. Finally, the approaches followed in the practical implementation of the 
numerical simulations are also expounded. 


2.1. MBP numerical model: The Go-like approach 


The phenomenological off-grid model proposed by Nobuhiro-Gò and 
Harold A. Scheraga (minimalist off-lattice native-centric Cœ Go-model) [1, 100] 
is a coarse-grained model that identifies all the amino acids with their Cy 
carbon atoms (Fig. 3.2 panel c). Therefore, the protein is reduced to a sequence 


Figure 3.2: Panel a: Maltose Binding protein appearance according to a full atom repre- 
sentation. Panel b: Secondary structure of the MBP evinced from its protein data bank 
file, PDB-ID: 4mbp. Panel c: Amino acid core structure. The characteristic residue chain 
of this elementary constituent is summarized by a box surrounding a capital R. The Ca 
carbon atom binds directly with the residue chain. Panel d: MBP coarse grained represen- 
tation. Accordingly to the Gd model, only the Ca carbon atoms along the main backbone 
chain of the protein are depicted. 


of material points (also called beads) coinciding spatially with the C atoms of 
the protein backbone and no side chain characteristics are retained (panel d of 
the same figure). The atomic mass is the same for all the residues and equal to 1 
in the so-called ‘code measure units’ or ‘internal units: 
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A lot of variances and refinements of this model are available in the current 
literature (see e.g. [2, 101-105]). All these approaches are generally referred as 
Go-like models. Here the approach presented in [2] is implemented. 

The minimum energy is assigned to a particular reference configuration, 

usually coincident with the native structure of the considered protein. ‘This is 
why the G6 model can be properly applied only when the crystallized structure 
is already known and the folding dynamics are mainly determined by the topol- 
ogy of the natured conformation, as it happens with small globular proteins. The 
Go-like model generates an almost ideal folding pathway (energetic traps associ- 
ated with metastable conformations able to hamper the folding process are prac- 
tically absent). Also, the native configurations so obtained are stable and close to 
the reference one (in terms of root mean square deviation [106]), which can be 
extracted, for example, from the protein data bank. It is a particularly suitable 
approach to simulate two-state folding dynamics, but it has been used also to 
predict intermediate configurations [2]. 
Thanks to its properties and to its simplicity the Go model has been widely ex- 
ploited for protein numerical analysis [55, 94, 107-115]. The main limitation of 
the approach is the requirement that the native tertiary structure of the protein 
has to be known. In addition, its topological nature does not account for chemi- 
cal factors, determining a considerable disagreement between real and simulated 
time scales: kinetic peculiarities of the folding process are not properly predicted. 
These discrepancies do not alter the folding thermodynamics and the correct se- 
quence of events. In most cases the model allows the correct assessment of the 
free energy values and of folding pathways. Moreover, transition states are co- 
herent with the ones experimentally determined. 

In the remaining part of this section the practical implementation of the 
model is presented. Following the G6-like approach selected, the potential acting 
on the residues of the MBP is constituted by four parts. 


e Peptide potential (or bond potential), Vp. 

e Bending angle potential, Vo. 

e Twist angle potential, Vy. 

e Non-bonded interactions (Lennard-Jones potentials or barriers), Vno. 


Let r;, r? (with è = 1, ..., m) be the position vectors of the m residues identified 
by their Ca carbon atoms, in the reference (native) and current configurations, 
respectively. The peptide potential, responsible for the covalent bonds between 
the beads of the polymer chain, has the following representation: 


k 
Vo(ri,ii) = p ("hirn ria), (3.1) 
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Figure 3.3: Panel a: Schematic view of the peptide bond. Panel b: Sketch of the bending 
angle interaction. Panel c: Representation of the twist angle formed by the two planes 
determined by four consecutive material points of the protein structure (three by three). 


where rji41 = Ti- Trib ri; „1 =r? —r?., are the position vectors (their norm is 
the bond length) between the bonded residues 7 and ¿+1 in the instantaneous and 
native configurations, respectively (panel a of Fig. 3.3). Moreover, kp = 1000€/d3 
is the empirical constant in terms of the parameters do and e named, respectively, 
the equilibrium length and the unit of energy. In particular, dy = 3.8A is the 
average distance between two adjacent amino acids and e sets the energy scale 
of the model. In summary, the bond potential is nothing more than a simple 
harmonic potential. 

The angular bending potential Vg plays a fundamental role in recovering 
the secondary structure of the reference native configuration of the protein. It is 
mathematically equivalent to the peptide potential, where relative displacements 
are substituted by angular differences, namely 


Vo(0.) = Sho( Bi - OY, (3.2) 


where kg = 20e rad”? is the elastic constant expressed in terms of the energy 
computational unit and 6;, 0? are the bond angles formed by three adjacent 
beads in the simulated (time-dependent) and native conformations, respectively 
(Fig. 3.30). 

An additional contribution to the composition of the secondary structure is the 
dihedral potential, expressed as a function of the twist angles ¢; and ¢?, again 
referred respectively to the actual and crystallized conformations. The twist an- 
gle is the angle formed between the two planes determined by four consecutive 
amino acids (three by three) along the chain, panel c. The definition of the twist 
angle potential V% is 


Voldi) = kg [1- cos(oi — 4?)] + ky” [1 - cos3(d; - 6))], 833) 
where i = e and ho = €/2 are the dihedral constants. 


In the G6 model a distinction is made among the pairs of residues that interact 
following a potential that has also an attractive part, in addiction to a repulsive 
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one, set to take into account excluded volume effects, and the ones that, instead, 
only repulse each other sterically. The distinction is made on the basis of the 
native distances with respect to a parameter of the model, the so-called ‘cut-off 
radius, Re. If the native distance between two (i and j) non-adjacent residues, 
i.e. j > i + 3, is lower than the cut-off distance, they are considered to be in 
native contact and will interact following a 12-10 Lennard-Jones potential (3.4). 
In other words, such residues will interact attractively if brought to a distance 
greater than the native one and repulsively otherwise: 


o \12 o 10 
Ti; Kis 

Van) =f (2) -6 (22) | (3.4) 
Tij Tij 


where all the symbols have been already defined. When ro; > Ra the purely 
repulsive contribution V,,nat is assigned to the pair of amino acids considered. 
This term is based on excluded volume effects. It enhances the cooperativity in 


the folding process and takes the form of a Lennard-Jones barrier 


12 
10 {o 
Vanat (riz) = Fe — ’ 3.5 
dou) = Fe( = 65) 
where o is, similarly to the SHG model, a free length parameter correlated with 
the extension of the excluded volume (self-avoiding polymer chain). It is usually 
set equal to the average dimension of the amino acids, approximately 4.5À. 
The total potential acting on all the residues of the protein is then: 


m-1 m_2 m-3 
Ves(ri)= >> Vo(riist) + YU Volti) + >> Valei) + >> Vnolriz), 
1 i=l i=l i,j>i+3 


(3.6) 


where the non-bonded term Vne summarizes the possible long range interac- 
tions just described and reads as 


Va) = € 


In what follows e = 1 and Re varies in the range (3.0 + 7.5) À. 
2.2.Pore and pulling models 
The confinement effect on a protein driven inside a narrow space such as 


a nanopore can be represented by a step-like soft-core repulsive cylindrical po- 
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Figure 3.4: A step-like [0 : L] cylindrical soft-core repulsive potential V. is chosen to 
model the effect of the confinement inside the pore. A constant force is used to drive the 
MBP inside the channel. It is always applied to the foremost residue inside the pore. 


tential. For simplicity the cylinder axis of symmetry is set coincident with the 
x-axis of the frame of reference used for translocation simulations. 

The same direction is used to develop the mechanical pulling of the MBP 
by the application of a constant force to the foremost residue inside the channel. 
The constant action is thought to constitute a reasonable approximation for the 
average effect of the electrical potential in a voltage-driven translocation exper- 
iment. 

The channel potential is given in (3.7) and schematic representations of the 
system, the potential and the pulling mechanism, are shown in Figure 3.4. 


Vee -v(£) O[x(L-2)]}, 67) 


Here O(s) = [1+tanh(as) |/2 is a smooth step-like function limiting the action 
of the pore potential in the effective region [0, L]. L and R, are pore length and 
radius respectively. Also, p = \/(y? + z?) is the radial coordinate. A convenient 
choice of the other parameters is q = 1, a = 3A~? and Vo = 2e [114]. The driving 
force acts only in the region in front of the pore mouth (the capture one) x € 
[-2,0], and inside the channel [0, L]. Pore length L = 100A and radius R, = 
10A are taken from aHL structural data. 


2.3.Langevin dynamics for the material points of the MBP 


The choice for the dynamics of the beads is not different from what is usu- 
ally employed in coarse-grained molecular dynamics simulations for it is the 
well-known Langevin equation (3.8a) that governs the motion of the material 
points. Consequently, numerical investigations are performed at constant tem- 
perature. The overdamped limit is actually implemented, i.e. # = 0, (3.8b); a 
standard Verlet algorithm is used as numerical scheme for time integration [116]. 
In other words, any particle of the system follows a dynamics enforced by this 
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three-dimensional equation, when a numerical simulation runs: 


ři = Yi B Vr V (ri) + T(t) + Fest, (3.8a) 
=Ni i T Fez 
es a (3.8b) 
y 


Here y is the friction coefficient used to keep the temperature constant (also 
referred to as Langevin thermostat) and T(t) takes into account thermal 
fluctuations, being a delta-correlated stationary and standard Gaussian process 
(white noise). Practically, the latter quantity represents a random force that, 
along with the friction term, satisfies the fluctuation-dissipation theorem 
(namely the mean-square value of I° is proportional to the corresponding 
friction coefficient, y). V(r;) is the total potential on the material points 
(generally speaking it includes both the terms belonging to the Go-like model 
and to the potential used to model the pore confinement (3.6) and (3.7), 
respectively, depending on the position r; of the issued bead) and Fert takes 
into account the possible contribution of the external forces (in the present 
setting it coincides with the force driving the translocation, which acts only on 
the pulled residue). 

Several analytical models can be implemented starting from a Langevin ap- 
proach but, since this topic is developed when 1D-continuum modeling is con- 
sidered, it is not further discussed here. 


3. Methods for translocation and stretching simulations 


The following sections describe the practical implementations of the numer- 
ical investigations in the cases of mechanical pulling in a confined environment 
(translocation simulations) and in free space (AFM-like mechanical stretching 
simulations). Thermalization runs are also outlined when necessary as they have 
been used to obtain statistically independent initial configurations. 


3.1. Translocation simulations 


For each cut-off radius Re, thermalization simulations are performed at T = 
0.75 ĝu in the already mentioned ‘code measure units’ (symbols will be usually 
dropped in what follows) to generate initial MBP configurations with one of the 
two ends constrained near the pore entrance. More in general, in a thermaliza- 
tion (or equilibration) simulation the protein is left free to fluctuate and possibly 
to unfold (depending on the value of the cut-off radius) in the heat reservoir. 
Protein configurations are sampled at time intervals equal to 10% of the simula- 
tion time window, Tą = 10°t,, with t,, time units, to ensure statistical indepen- 
dence of initial states. After thermalization, the protein is driven by applying the 
pulling force F in the capture region and into the pore. 
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Protein translocation is considered accomplished when the last residue 
leaves the trans-side of the channel. As it will be shown in the result section, 
there are conditions where the molecule escapes the capture by diffusion, 
despite the action of the importing force. To save computational time, 
simulations are stopped and discarded if the protein reaches a distance from the 
pore entrance comparable to its native state linear size (approximately 20À). 
This criterion is based on a preliminary estimate of a negligible probability for 
the molecule to re-approach the pore by diffusion and re-start the translocation. 

To set the nomenclature up, such cases of protein escape will be labeled as 
loss events. Cases where the protein is neither translocated nor lost at the end of 
the time window are indicated as unsuccess. 


3.2. Stretching simulations 


Before each stretching simulation (AFM-like), the protein is equilibrated 
without constraints or additional forces in order to obtain a proper initial con- 
figuration. Stretching is simulated by using a constant velocity Steered Molecular 
Dynamics (SMD) strategy [117] where protein elongation is induced by a spring 
of elastic constant k = 0.le. The protein N-terminus is held fixed, the C-terminus 
is attached to the first end of the spring, which second end is dragged at constant 
velocity v in the direction of the initial end-to-end vector. To test the result- 
ing robustness of simulation protocols, we use different steering velocities and 
perform AFM-like stretching also from the N-terminus (with the C-terminus 
blocked). 

As mentioned in the introduction, MBP contains four unfoldons (M1, M2, 
M3 and M4, Fig. 3.1). To monitor the denaturation degree of the k-th unfoldon 
domain, we consider the number of its internal active native contacts, normal- 
ized to the corresponding value in the PDB crystallized structure, as a function of 
time. Two residues originally in contact are considered detached when their dis- 
tance in the actual molecule conformation exceeds 1.23r9;. The latter criterion 
is based on a negligible attractive force beyond such a distance, which approx- 
imately coincides with the point where the 12-10 Lennard-Jones potential (3.4) 
changes curvature. 


4. Translocation and stretching numerical results 
4.1. Denaturation characterization 

In the Gō-model force field (3.6) a decrement in the cut-off radius Re plays 
virtually the rôle of a chemical denaturing agent because, by reducing the num- 
ber of attractive long-range interactions (3.7), the native state is destabilized with 


respect to the action of thermal fluctuations. Denaturation can be also achieved 
by increasing the temperature or decreasing the energy scale e (thermal denat- 
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uration). The latter approach would be however too global to be compared to 
specific experiments [23, 27], because it would soften both bonded and non- 
bonded interactions (see eqs. (3.6), (3.7)). The denaturation we implement in 
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Figure 3.5: Panel a: Contour map of the RMSD as a function of the cut-off radius and sim- 
ulation temperature. Darker regions reflect states were the equilibrated protein structure 
resembles the native configuration. The transition from folded to unfolded state is a two- 
state transition and is extremely sharp, in compliance with the experimental findings on 
MBP denaturation. Panel b: Denaturation plot of MBP obtained by thermal equilibrium 
simulations at T = 0.75. The proper rescaling, (3.9), of RMSD makes simulation data 
to collapse onto Ganesh et al. denaturation plot [118] (fraction of unfolded structures by 


circular dichroism spectroscopy). 


such a minimal model context is the closest to the chemical one, as it reduces 
only the number of attractive non-bonded interactions responsible for the col- 
lapse onto compact native structures. That behavior mimics the chaotropic ac- 
tion of a solute (denaturant) which destabilizes the native states by competing 
with internal non-covalent protein interactions, such as hydrogen bonding and 
hydrophobic effect. 

We thus expect that laboratory denaturation conditions can be effectively 
taken into account by a suitable choice of the cut-off radius. In fact panel a of 
Figure 3.5 depicts a contour plot of the Kabsch distance (mean root-square de- 
viation of a structure with respect to the native state [106]) as a function of the 
simulation temperature and cut-off radius. The transition from the region where 
MBP structures are folded (black) to the denatured one (white) is sharp, thus 
pointing out the suitability of the selected approach to model chemical denat- 
uration. Experimentally, Ganesh, Shah, Swaminathan, Surolia and Varadarajan 
have analyzed chemical MBP denaturation by guanidine hydrochloride (Gnd- 
HCI) at constant temperature, T = 28°C. They have shown that the unfolded 
protein concentration pu, at different GndHCl concentrations [D], estimated 
via circular dichroism spectroscopy, could be fitted by means of a standard two- 
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state model (dashed line in Fig. 3.5 panel b). The latter Figure compares the 
denaturation of Gò-model MBP as a function of the cut-off radius Re, with the 
denaturation curve in [118]. As a nativeness indicator for the numerical data we 
employ the already mentioned RMSD. Mapping between simulated and exper- 
imental denaturation results is established after a base-line subtraction of the 
RMSD values and a normalization to the [0, 1] interval. It reads 


a a 


T R.- Ro Re — RO 


[D] (3.9) 


where our reference interaction cut-off R? = 7.5A corresponds to the fully native 
state [D] = 0 in the experiment; R? = 4.0A is in turn the cut-off radius for 
complete denaturation and a is a tunable parameter adjusted to achieve data- 
collapse. 


4.2. Translocation dynamics 


We characterize translocation success (top left panel of Fig. 3.6) in terms of 
translocation probability Pp,. The fraction of translocated simulations is consid- 
ered with respect to the total number of attempts, as a function of the importing 
force F. Figure 3.6 refers to T = 0.75 and Re = 6.8A that, as shown in Fig- 
ure 3.5), is the lowest Re-value corresponding to native-like conformations, for 
both N-pulling (empty symbols) and C-pulling (filled symbols). In addition to 
Py the left side of Figure 3.6, i.e. panels b and c, provides also the fraction of 
loss events, Pz, and unsuccessful events Py = 1- Pr - Prr. Pr is the percentage 
of molecules that escape the capture by diffusion in the bulk. Py is the fraction 
of proteins that are neither translocated nor lost within the time window Tw. The 
step-like shape of Pr, vs. F allows a clear definition of the critical force F, as 
the value for which Pp,(F.) = 1/2. If F « Fe, most of not-translocated runs 
are lost, since the importing force is so weak to be easily overcome by thermal 
motions. By increasing F, Pr, rapidly decays whereas the number of translo- 
cated and unsuccessful runs increases. For F ~ Fe, almost all the untranslocated 
proteins correspond to unsuccessful runs and very few are lost. The overall sce- 
nario also indicates an asymmetric translocation process of folded MBP, which 
depends on the pulling terminus. The difference in Fe and in the number of 
proteins stuck in the pore between N- and C-pulling suggests that the transport 
pathways are different in the two cases. 

To characterize the role of denaturation in the translocation mechanism, we 
estimate F. at different cut-off radii. Panel d of Figure 3.6 shows that translo- 
cation at low denaturation (large .) requires high forces. Decreasing Re, the 
critical force is reduced and below R, © 6.8À it reaches a plateau. Actually, once 
denaturated (Re < 6.8A, see Fig. 3.5), the dynamics of random coil MBP con- 
figurations becomes unaffected by a further reduction of Re. In contrast, as for 
Rx > 6.8À the MBP structures become more and more compact and stable, the 


45 


Coarse-grained molecular dynamics and continuum models 


iL T_T T T_T T T_T 7, e— ®© 
ost Prr e o0 
0.6L eF. ° 

i gio 3 r i 
047 e 2% [ d 
02-4 e i 9 ab 2 

piglet Boia s 

-p 0.84 1.06 iE d 
0.8 L y . 

[ @-----1-111111 na a= 
0.6- av 0 I I 
0.4- a ce è * á pmu 
0.2L å y 04H  ~ K 

2-b aX H y e 

Qui | XW, Jå Pag + 0.2} 

- p. N 
ie “e og Om Py a a i ves 
0.6- A HA pi ili f 
04° so | o 3 5 R 6 7 

n o e 
0.2 C o x ù 
olmi gl 
0.4 0.8 2 
F 


Figure 3.6: Here and in the following figures units are internal, i.e. related to the coarse- 
grained model we used. Distances, when present, are in A. Empty symbols N-pulling, 
filled symbols C-pulling. Left hand side: Statistics of translocation as a function of the 
importing force F at Re = 6.8A and T = 0.75: (a) translocation probability Prr, (b) loss 
event probability Pz and unsuccessful run probability Py (c). Right hand side: Critical 
force F, as a function of Re for T = 0.75, panel d. Loss and unsuccess events probability, 
Pr, and Py respectively, at critical force, panel e. 


increment of the critical F, is naturally expected, due to a stronger resistance to 
unfolding. Therefore, the critical force is a quantity able to discriminate folded 
and unfolded MBP structures, in analogy to the applied voltage in experiments, 
(see [23, 27, 38]). 

A further difference between denaturated and native-like MBP transloca- 
tion is revealed by the amount of loss and unsuccessful events at critical force. 
Panel e of Figure 3.6 shows a clean transition at Re = 6.8À in both Py and Py 
at critical force. At smaller Re (denaturated state) Pr, = 0.5, hence none of the 
unstructured chains gets stuck in the pore. That behavior is compatible with 
the idea of an importing force competing with thermal fluctuations to insert a 
residue in the pore: Fe ~ Fin = kpT/do = 0.2, with do = 3.8À the average 
distance between consecutive residues. Once the first core of residues is im- 
ported, denaturated structures oppose a weak resistance and their translocation 
is easily accomplished. In contrast at larger Re, the critical force is significantly 
greater than Fin. Hence, almost all MBP configurations start getting imported 
and those unable to finalize the transport within the allotted time end up stuck 
into the channel. At low denaturation, long pore blockades are entirely due to 
the structural resistance of folded MBPs to mechanical unfolding. 
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4.3. Residence time statistics 


Figure 3.7 reports the average translocation (or blockade) time m, as a func- 
tion of the importing force for different values of the interaction cut-off Re, for 
both the N-terminus and C-terminus pulling (panel a and b respectively). For a 
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Figure 3.7: Panel a: Average translocation time on the sub-ensemble of successfully 
translocated runs 7) vs. importing force F. The different symbols concern different 
cut-off radii, namely Re = 3.0, 5.0, 6.2, 6.5, 6.8, 7.0, 7,2 and 7.5À are represented by 
upward triangles, downward triangles, squares, circles, diamonds, first-quadrant-filled 
circles, second-quadrant-filled circles and pentagons, respectively. The dashed line rep- 
resents the fit (3.10). Data for N-pulling simulations. Panel b: Average translocation time 
on the sub-ensemble of fully translocated simulations m, vs. importing force F. Differ- 
ent symbols represent different cut-off radii, namely Re = {3.0,5.0, 6.2, 6.5, 6.8, 7.0}A 
and 7.5A, squares, circles, downward triangles, upward triangles, cross, diamonds, stars 
respectively. The dashed line represents the fit, (3.10). Data for C-pulling simulations. 
Panels c and d: translocation time distribution for Re = 6.5A and F = 0.575 (denatu- 
rated) and Re = 6.8A and F = 1.10 (native) respectively. The dashed lines in panel c and 
d are an inverse Gaussian (3.12), and a double exponential fit (3.13). 


given cut-off radius, Tẹ generally decreases with increasing F, as it is expected. 
However for Re < 6.5A, the 7 vs. F data collapse onto a single curve (dashed), 
confirming that MBP structures are completely denaturated and react similarly 
to the importing force. The curve is well fitted by a three-parameter (To, Fo, Ho) 


relation, namely 


L 
T(E) = To e FP + oF” (3.10) 
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which is a linear combination of an activation term œ exp(-F /Fo), and a con- 
stant velocity drift oc 1/F. The origin of the activation term can be explained 
as in what follows. An irreversible translocation can occur only after a stable 
core of residues is established inside the channel. This sort of capture process 
requires the overcoming of the initial entropic barrier due to the strong con- 
finement of the chain induced by the channel, that even unstructured polypep- 
tides experience [114]. For Re > 6.8A, the average translocation time grows with 
the decrement of F (crosses, diamonds, stars in Fig. 3.7a and b), indicating that 
translocation is drastically slowed down. At large enough forces, overwhelming 
the stability of the native-like MBP, 7) tends to collapse on (3.10), regardless the 
value of the cut-off radius. The curve depicted by (3.10) thus constitutes a sort of 
base-line for all translocation times. In low-force regimes, the bending of data 
toward the horizontal line 7 = Tw is an effect due to the time window finiteness. 
In Figure 3.7a and b, that effect has been partially corrected by using the formula 


1 


Th = —— 
Pry + Py 


(Ps TTr + Py Te) (3.11) 


where Tr, is the average time from the translocated runs and Pr; is the cor- 
responding probability (see Fig. 3.6). The use of (3.11) can be justified. Let 
us consider a set of M runs, where Mr of them translocate within Tu, while 
Mu do not accomplish the passage but are expected to translocate in a time 
greater than Tw and My, are lost (will practically never translocate), obviously 
M = My + Mr, + Mv. By definition, after discarding lost events, the average 
translocation time is 
1 Mr+My 


t= = rr 
My + My 


tr, 
r=1 
where t, is the translocation time of the r-th run (thought for Tw > 00). We 
could rewrite 7 by separating the runs for which t, < Tw from the runs for 
which t, > Tu and, consequently, split the summation into 
Mr I Mo Mu 1 “Ss 


d A 


ecs de 
Mr + My Mr {zi Mr + My Mu Fi 


Here, the first average involves the terms t/. < Tw, while the second one the 
terms t’ > Tw. By considering that Py = My/M and Py, = Mr/M, we get the 
expression 

1 


To = ———_ 
Prr + Py 


(Parn +PoTu) 


where we have used the lower bound for t! > 7, and the formula 
1 5M 
TTr = Mr ear Li 
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Panel c of Figure 3.7 shows a typical translocation time distribution y(t) of 
a denaturated MBP. It looks well localized around its average 7, and it is properly 
fitted by an inverse Gaussian, with parameters Do, jo: 


L L-wuoFt)2 
TEDA exp : Cowie}, 6.1) 
0 0 


that is the first-passage time distribution of biased random walkers on the inter- 
val [-00, L] emitted in 0 and absorbed at -o0 and L [57]. An instance of the 
time distribution w(t) of native-like MBP translocation is plotted in panel d. In 
this case, w(t) has a fatter large-time tail and it is well reproduced by a double 
exponential: 


p(t) = 


_ kik —ki(t-to) _ ,—ke(t-to) 
w(t) = da È e | (3.13) 
with rates k1, k2 and to being an offset time interpreted as the time taken by a 
denaturated MBP to cross the pore. As we shall see below, the double exponen- 
tial is justified by the presence of two successive stall points, corresponding to 
two energy barriers, the overcoming of which can be seen as an activated pro- 
cesses. Translocation phenomenology is better characterized by addressing the 
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Figure 3.8: Time evolution of the number of residues on the cis-side of the pore Neis 
for a pulling run at Re = 6.8Å and critical force (panel A: C-terminus pulling, panel 
B: N-terminus pulling). Snapshots of representative configurations are reported for the 
plateaus corresponding to the protein blockades. The average time spent by the protein in 
different Neis, at critical force, is reported in panels C, D and E for Re = 6.8À C-pulling, 
Re = 6.8À N-pulling, Re = 6.5A C-pulling, respectively. 


time evolution of the number of residues Neis on the cis-side of the pore. Panel A 
of Figure 3.8 shows Neis (t) for one successful run at critical force for Re = 6.8A, 
pulled from the C-terminus. Clearly, most part of the time is spent by the MBP in 
two particular stalling states, Neis © 335 (C-Stl) and Neis © 267 (C-St2), corre- 
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sponding to configurations where either residue 335 or residue 267 are respec- 
tively located at the pore entrance. Representative snapshots of the two states 
are included in the figure for illustration. To gain more quantitative informa- 
tion we compute the time t, (Neis) spent by the r-th run in a given Neis state 
during translocation. Time t( Neis ), defined as the average of t, (Neis ) over the 
ensemble of translocated runs, is plotted in panel C of Figure 3.8, for C-pulling 
translocations at critical force and Re = 6.8A. The two peaks in the histogram 
correspond to the two blockade events shown in panel A. Clearly, translocation 
is far from being uniform and looks more as a step-like (or stick and slip) pro- 
cess, where the protein has to overcome successive free-energy barriers associ- 
ated with specific structural rearrangements. The analysis of N-pulling (same 
cut-off radius and corresponding critical force), panels B and D in Figure 3.8, 
confirms the overall picture: also in this case the protein spends most part of 
the time in specific configurations. The stalling events are three and take place 
at different positions, namely Neis = 363 (N-Stl), corresponding to residue 7 at 
the pore mouth, Neis = 307 (N-St2, residue 63 at the mouth) and Neis = 256 
(residue 114, N-St3). To complete the phenomenology, we note that the transport 
of denaturated proteins (Re < 6.8À) appears much more uniform (peak-less), 
panel E of Figure 3.8. 

The scenario for both N and C-pulling is robust under changes of both Re 
(provided that Re > 6.8À, i.e. native-like MBP) and pulling force range F, with 
only slight differences in the peak intensity (Fig. 3.9). Concerning the unsuc- 
cessful translocations, we find that the already mentioned stall points still occur 
(Fig. 3.9 and Fig. 3.10), just to confirm their leading role in transport bottlenecks. 
In summary, panel B ofFigure 3.1 sketches in a 2D-topological view the positions 
ofthe five stall points detected for structured protein translocation runs (residues 
7, 63, 114, 267, 335). 

The loss events can provide insights into the translocation process and might 
be related to experimental observations. A typical event observed in loss sim- 
ulations is that the protein spends some time at the pore entrance, attempting 
to insert the terminus, before definitely escaping the capture by diffusion. The 
duration of the events in which the number of residues inside the pore is greater 
than 0 is well determined in lost simulations and defines a sort of protein-pore 
interaction time tz. Moreover, reasonably assuming that in between two suc- 
cessive events the protein remains close enough to the pore to generate some 
kind of blockage, it is possible to attempt a closer connection between simula- 
tion results and experimental findings. Panel A of Figure 3.11 depicts the inter- 
action time distribution obtained for a denaturated protein while panel B shows 
the trend of the average interaction time 77 as a function of the denaturation 
degree (as usual characterized by the Go-model interaction cut-off Re) for dif- 
ferent importing force intensities. A common behavior of unfolded structures is 
evident, while the average interaction time for compact molecules results greater 
and dependent on the cut-offradius. Although a quantitative comparison might 
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Figure 3.9: Average time spent by the protein as a function of Neis. Panels of the first 
row refer to translocated simulations (left), and unsuccessful runs (right), both for C- 
pulling at critical force, R = 7.2À. The latter panel clearly shows the presence of the two 
main blockades (C-Stl and C-St2), which can be detected also for successful simulations, 
thus confirming the robustness of the data shown in panel C of Figure 3.8 with respect to 
changes in Re. In the second row, analogous results are reported for N-pulling translo- 
cations: three main peaks are evident (N-Stl, N-St2 and N-St3). Finally, in the last row 
the same data for successful simulations for Re = 6.5 A (left) and Re = 6.2A (right), 
both for N-pulling at critical force are depicted. For Re = 6.5A the peak structure still 
partially holds (two on three peaks are still detectable) and it is necessary to decrease the 
value of the cut-off radius to 6.2A to recover a homogeneous behavior equivalent to the 
one shown in panel E of Fig 3.8, where the pulling is from the C-terminus. 


be too ambitious, the scenario we observe in the simulations closely resembles 
the presence of the ‘bumping events’ reported in the experiments by Mestorf et 
al. [28] and classified as short-time low-intensity current blockades. In prac- 
tice, the protein moves close to the pore partly blocking the current but does not 
start a translocation. The bumping time distribution results to be well fitted by 
an exponential (Fig. 3.11A) and its average is about one order of magnitude lower 
than the corresponding average translocation time, a fact qualitatively in agree- 
ment with the experimental data (for the numerical specific case here considered 
T)/T1r = 50). 


4.4. Stretching vs. translocation 
The previous picture of MBP translocation rises the natural question as to 


why the transport of initially compact structures becomes temporarily stalled 
at well defined stages. We relate these ‘rate limiting steps’ of translocation to 
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Figure 3.10: Average time spent by the protein as a function of Ncis for unsuccessful runs 
for Re = 6.8 A. Left panel refers to C-pulling, right panel to N-pulling. The two peaks in 
the left panel and the three in the right one correspond to the peaks in panels C and D of 
Fig 3.8, respectively. 


20000 r 
1 
wf ! 
ù es 
L P sat 
\ aexp al 15000 i 
\ v 
P À i 
If oN TU n 
1 
ict % 10000 | : doce I 
N ! Native 
' 
\ ' 
Y i 
\ 4 500 1 
[aN Denaturated i 
IC \ ° a at 
1 
L L 


o 1000 2000 3000 4COC 3 4 5 6 7 5 

tu R 

Figure 3.11: Panel A: Protein-pore interaction time (tz) distribution and associated ex- 

ponential fit for unfolded protein and small pulling force. Panel B: Trend of average in- 

terference time as a function of the pulling force and cut-off radius. Unfolded structures 

show a common behavior at short 77 while a greater average interaction time is associ- 

ated to compact structures. The vertical dashed line marks the R,-threshold separating 

in the Go-model approach native-like from fully denaturated conformations. Different 

symbols correspond to four increasing force intensities (in the order: squares, circles, up 
and down triangles). 


the MBP structural properties that govern the free-space stretching. As men- 
tioned in the preamble of this chapter, targeted AFM experiments [98] have 
shown that MBP mechanical stretching occurs via a sequence of events corre- 
sponding to the successive breakdown of specific domains, termed unfoldons. 
The analogy of pulled translocation with AFM mechanical stretching suggests 
that unfoldons might be involved also in MBP translocation bottlenecks. Pre- 
liminarily, we check, via standard Steered Molecular Dynamics stretching pro- 
tocol [117] that the Go-model of the MBP is able to reproduce the ‘unfoldon 
picture’ observed in the experiments [98]. In Figure 3.12A the average over 50 
runs of the force-extension (end-to-end distance Ree) curve of the MBP in the 
numerical stretching experiment is depicted. The three branches, denoted by 
MI, M2+M3 and M4 according to the nomenclature used in [98], separated by 
the three worm-like-chain curves [119], identify the same unfoldon opening se- 
quence described in the reference experimental work, including the coupled un- 
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Figure 3.12: Right: force (A) and average value of fraction of native contacts Qs (Mk) 
k = 1,4 (B) vs. the end to end distance Ree in stretching simulation at pulling velocity 
v = 0.05, temperature T = 0.5 and cut-off radius Re = 7.5À. Dotted lines represent 
worm-like chain fits [119]. Left: average value of Qs(Mk) as a function of Neis for Re = 
6.8 and T = 0.75 for N-pulling (C) and C-pulling (D) translocated runs at critical force. 
Vertical black lines mark the residues mainly involved in the translocation bottlenecks. 


folding of M2 and M3. Also, the stretching force required to perform constant 
velocity detachment follows a peculiar trend qualitatively in compliance with the 
experimental evidence (see Fig. 3.13 for a quick summary of the latter). The pic- 
ture becomes clear when plotting the average fraction of the active native con- 
tacts Q,(Mk), k = 1,...,4 in each unfoldon as a function of the end-to-end 
distance Ree of the MBP. For instance, in the initial stage of the elongation, cor- 
responding to the curve branch labeled by MI, the structural parameter @ (M1), 
dash-dotted line, decreases indicating the ròle of M1 detachment in this stage. 
The simultaneous opening of M2 and M3, solid and dashed line respectively, is 
also evident in the second branch. Finally M4 opens. Snapshots of the detach- 
ment sequence are shown in Figure 3.14. 

To understand the role of the unfoldons in the MBP translocation process, 
first we locate the five blockage-points with respect to the unfoldons in the pro- 
tein structure (Fig. 3.1). Then, again, we plot the unfoldon structural parameters 
Qs(Mk) as a function of Neis for C-pulled and N-pulled translocations, re- 
spectively (Fig. 3.12C and D). In the C-terminus pulling case (Fig. 3.12C), the se- 
quence of unfoldon opening is essentially the same as in the mechanical stretch- 
ing. First, M1 breaks down, followed by M2 and M3, which dynamics is again 
correlated, and finally the M4-opening concludes the process. The vertical dot- 
ted lines in Figure 3.12C and D highlight the two stall points C-Stl, C-St2 in the 
C-pulling and the three other ones, namely N-Stl, N-St2, N-St3, in the N-pulling 
(see discussion of Fig. 3.8). A very weak correlation emerges between unfoldon 
dynamics and blockades. Specifically, the two configurations reported in panel 
A of Figure 3.8 (see also Fig. 3.1B and the vertical lines in Fig. 3.12C and D) show 
that the first blockage-point for C-pulling (residue 267) takes places in unfoldon 
MI (darkest gray in the figure), while the second one (residue 335) lies in the 
initial part of M2 (light gray). In the pulling from the N-terminus, we observe 
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Figure 3.13: Adapted from [98]. Characteristic force-extension trace of a ddFLN1-5-MBP 
chimera [98]. Unfolding of the native protein and subsequent elongation of the polymer 
is shown in the darkest gray (unfoldon M1), the breakdown of M2 is shown in light gray, 
the detachment of M3 is shown in dark gray and the unfolding of M4 is shown in the 
lightest gray. Contour length increases are marked on worm-like-chain (WLC) curves fit 
to the data. At extensions > 125nm the characteristic unfolding pattern of three ddFLN 
scaffold domains is visible (gray, leftmost and rightmost ends), followed by detachment of 
the protein. Inset: Superposition of several MBP unfolding traces zoomed into the region 
of the short-lived intermediate state M34 (dark gray) resulting from the detachment of M2 
(light gray). A best WLC fit to the curves is shown in black. All light gray levels (lifetime 
of the intermediate state M34) fall on the same WLC curve, indicating the breakdown of 
a well-defined state. 


that M3 is the first to be broken followed by M4, then M2 and finally M1. Again 
the blockade-points do not appear directly related to the unfoldon boundaries, 
panel B of Figure 3.8, with the sole exception of the stall N-St3, located at the 
boundary between M3 and M4. 


5. Discussion 


An essential result of our analysis is a sharp change in the translocation dy- 
namics upon varying the cut-off radius from Re = 6.5A to Re = 6.8A. We recall 
that, according to (3.9), a mapping between R, and the MBP denaturation de- 
gree can be established (Fig. 3.5). 

Denaturated-MBP translocation (R, < 6.5A). Below R, = 6.5A, the MBP 
behaves as a random coil: its translocation dynamics is basically independent 
of Re as revealed by the critical force behavior (Fig. 3.6d) that remains almost 
constant for Re < 6.5A and by the blockage time data (Fig. 3.7a and b) that 
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Figure 3.14: Snapshots of MBP unfolding at T = 0.5, Re = 7.5 A and v = 0.001. The 
unfolding sequence is the one described by Bertz and Rief, [98]. The first part to open 
is the unfoldon M1 (darkest gray) as shown in panel B, then M2 and M3 unfold (light 
and dark gray), panels C, D and E. With the exception of the initial portion of M3, the 
last structure to detach is M4 (lighest gray), panels F and G. In panel, A the initial folded 
MBP conformation is depicted. 


collapse onto the same curve for every Re, thus exhibiting a common behavior, 
(3.10). In this R,-range, not only the critical force F, but also the probability of 
lost events at Fe shows a plateau (Fig. 3.6e). Moreover, the pulling terminus (C or 
N) does not influence significantly any of these quantities. The sole barrier in the 
translocation dynamics is due to the pore entrance; once the protein is captured, 
the average time spent at different stages of the process is similar (panel E of 
Fig. 3.8). Thus, random-coil MBP translocation can be interpreted as a capture 
(activated) stage followed by an elementary first passage process, where random 
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walkers under a constant bias are injected from the cis-side of the channel and 
absorbed to the trans-side. 

Native-like MBP translocation (R. > 6.8A). Folded structures exhibit a 
much more complex dynamics, characterized by a transport getting stalled in 
the pore due to specific MBP configurations. Translocation times are greater 
than the unfolded case. Their average increases with Re (Fig. 3.7a and b) and 
their distributions show fat tails (Fig. 3.7d). These results agree with the experi- 
ments described in [23], where, in the explored forcing regime, only denaturated 
proteins were able to rapidly translocate, whereas very long blockades pertained 
to partially folded ones. The capability of numerical techniques to explore the 
dynamical details evidences the presence of stable and reproducible bottlenecks 
of the transport, namely C-Stl, C-St2, N-Stl, N-St2, N-St3 in native-like MBP 
transport (Figs. 3.8, 3.9 and 3.10). Natural candidates for such stall points are the 
boundaries between MBP unfoldons. However, our results clearly indicates that 
the AFM-unfolding dynamics is not correlated to the stalls. 

Rather, the analysis of MBP native-contact maps allows us to shed light on 
the actual mechanism underlying the stalls. Indeed, the blocking C-Stl and C- 
St2, but also their counterparts for the N-pull case, can be interpreted in terms of 
a specific sequence of native-contact breaking an MBP sub-domain undertakes 
in order to engage the pore in an almost linear conformation. In fact, the stalling 
domains of the protein are those with the greater amount of external native con- 
tacts, where external means ‘excluding all the inner contacts of the domain. In 
formulas, if K is the domain, then its contact density reads 


pe(K)=— YY Muy, (3.14) 


MK ick jéK 


where mx denotes the number of residues in K and M;; = 1, if i and j form 
a native contact, and M;; = 0 otherwise. In Figure 3.15 the contact map of na- 
tive MBP, obtained for Re = 6.8A is considered. The lower side panel shows 
the average of external contacts composed by the ten-residue-long consecutive 
regions of the MBP. Peaks identify the protein regions with the greater number 
of contacts that are thus the most probable candidates to cause the stall points 
once engaging the pore entrance. However, by considering the dynamics data 
(like the configurations associated with the stalls, Fig. 3.8C for example), we can 
be more precise by selecting the two critical regions around the peaks C-Stl and 
C-St2: C-K1 = [328, 338] and C-K2 = [260, 270], respectively, and the comple- 
mentary regions X1 = [339, 370], X2 = [271,327] and X3 = [1, 259], so to patch 
the whole molecular structure. Table 3.1 summarizes the density of external con- 
tacts (see (3.14)) that these regions form with the part of the molecule running 
from the proper boundary ofthe issued region to the free terminus (a generaliza- 
tion of this approach concerning only a static analysis of contact maps - without 
resorting to dynamical data - is provided in the next chapter). For instance, ex- 


56 


Marco Bacci 


n x 
350 — 
L F b Fa J 
300 bg na . 4 1 
Na ] 
È +4" ] 
250 È Mi n 7 a 
g % 4 + hi ] 
2l + ff £ 
2 È NE i b 
150} » Pes = 
È m J 
E * ue, » | J 
100 È Fa + 1 
E d | 
Pe Fa a|] 
sof di - h 
i ] 
a CAPO TRO SA IE 
E LL 50, 100) 150 200 259 300 | 350 | 
ace . 
52 
2 0 


50 100 150 200 250 300 350 
residue 


Figure 3.15: Contact map of native MBP obtained for Re = 6.8À. Bottom panel depicts 
the average number of long range contacts each residue establishes. The peaks identify the 
regions of the chains with the larger density of external contacts, which are responsible for 
putative stall points. Vertical gray bands highlight the stalling regions as obtained by the 
analysis relevant to Figure 3.8. These regions form a significant number of contacts with 
distal parts ofthe molecule and only when the contact cluster at the pore cis-side mouth is 
broken the translocation can proceed further on, up to the next bottleneck (next cluster). 


ternal contacts pertaining to C-K1 bond this domain with residues from 327 to 
1 (N-Terminus), as here C-terminus pulling blockades are exemplified. This ap- 
proach allows tracking only the effective external contacts that come into play 
when a region is facing the pore, namely the contacts formed with just that part 
of the molecule which has not yet engaged the pore (backward contacts). The 
higher values pertain to C-K1 and C-K2 sub-domains. They confirm the picture 
emerging from Figure 3.15. These areas, being with high contact density, oppose 
the maximal resistance to unfolding once in front of the pore. In other words, 
translocation bottlenecks are determined by those sub-domains that, still par- 
tially folded when approaching the pore cis-side, carry with themselves other 
distal regions of the molecule tightly bonded to by native interactions. The con- 
tact analysis leads to the same conclusion also for the N-terminus pulling, where 
the critical regions are N-K1 = [1, 11], N-K2 = [55, 63], N-K3 = [105, 115]. 
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C-pulling N-pulling 
Name | Region p Name | Region p 
XI [339,370] | 0.75 || N-K1 | [1,11] 3.64 
C-K1 | [328,338] | 2.00 Y1 [12,54] 0.47 
X2 [271,327] | 1.07 || N-K2 | [55,63] 2.33 
C-K2 | [260,270] | 5.09 Y2 [64, 104] 0.93 
X3 [1, 259] 0.00 || N-K3 | [105,115] | 5.45 
—- ——— —- Y3 [116,370] | 0.00 


Table 3.1: Contact density of MBP subdomains described in the text. Col.l: domain 
nomenclature, C-K1, C-K2 and N-K1, N-K2, N-K3 stand for critical blocks in C- and 
N-pulling respectively, X1, X2, X3 and Y1, Y2 and Y3 are the complementary regions. 
Col.2: residues involved in each sub-domain. Col.3: density of external native contacts 
as defined in (3.14). Critical blocks have a contact density larger than complementary 
regions. 


6. Remarks 


We have simulated Maltose Binding protein translocation across aHL 
nanopore via a coarse-grained computational model for both the MBP and 
the pore. As the channel is narrow, translocation properties strongly depend 
on the denaturation state of the MBP. In the issued Go-like model, molecule 
denaturation is controlled by the parameter Re determining the number of 
native attractive interactions. In the region (6.5 < Re < 6.8)A, a transition is 
observed from random-coil MBP (denaturated) to native-like structures. The 
transition emerges from both equilibrium (Fig. 3.5) and transport simulations 
(Fig. 3.6). In particular, translocation of denaturated MBP is almost uniform 
and consists of a capture stage followed by a simple driven diffusion process. 
The passage in the channel of folded MBP is more critical and interesting, it 
looks like a stick-slip dynamics characterized by constant-velocity transport 
broken by stalling events in the channel (Fig. 3.8). For instance, the C-terminus 
translocation occurs via two long stall events resulting in a double-exponential 
tail-behavior of the translocation time distribution (Fig. 3.7d). This behavior 
is presumably associated with the presence of two subsequent free-energy 
barriers that MBP has to overcome in order to complete the passage. Our 
analysis also shows that stall events are related to the MBP regions with high 
density of native external contacts. Thus, in principle, long blockade events and 
stall points can be predicted by looking directly at the MBP PDB-structure. In 
contrast, a weak correlation is found between stall points and unfoldons, the 
structural blocks through which the MBP reacts to mechanical stretching in 
free space. This result is a strong indication that, despite the analogy between 
pulled translocation and mechanical unfolding, the pathways gathered from 
mechanical pulling are not sufficient to make inferences on translocation 
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mechanisms. The action of the pore, indeed, drastically modifies the unfolding 
pathway during the transport with respect to a free pulling process. 
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Chapter 4 
1-Dimensional Continuum Models 


1. Preamble 


In order to interpret numerical results it is appropriate to built suited ana- 
lytical models of the physical and chemical processes that occur in real translo- 
cations and to define proper statistical observables to grasp and summarize the 
essence of the phenomenon. 

The first section of the present chapter is dedicated to an overview of the es- 
sential ideas behind low-dimensional continuum models used in the literature 
to summarize both experimental and three-dimensional molecular dynamics re- 
sults. 

Section 3 highlights the practical implementation of the specific approach 
performed to obtain the free energy profile for the translocation simulation pro- 
tocol described in chapter 3. Therefore, umbrella sampling simulations and mul- 
tiple weighted histogram analysis method on coarse-grained molecular dynam- 
ics (MD) simulations are outlined [96]. The proteins under scrutiny are again de- 
scribed by the Go-like model already introduced, which accounts accurately for 
the influence of the native structures on both folding and translocation pathways 
[56, 114, 115, 120]. Finally, section 4 expounds the achieved results and presents a 
comparison with the 3D numerical investigations. Also, since a mutant version 
of the MBP (termed MalE262) and an additional protein, the Endoglucanase 
Cel5A (PDB-ID:1a3h), have been considered in this perspective, relevant out- 
comes are provided as well. 


2. Overview 


2.1. The general framework 


Experimental results highlight the presence of energy barriers that the 
molecules have to overcome to accomplish translocation [121]. Also, empirical 
distributions of blockade times show tail asymmetry typical of phenomena 
where the motion is due to diffusion and drift effects [16, 21]. However, even for 
a coarse-grained description of the protein dynamics, the energy landscape 
is still too high-dimensional to allow a concise representation of the process. 
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It is hence convenient to project the system evolution over relatively few 
effective coordinates, Q1,...,Qx (termed reaction coordinates or collective 
variables), functions of the positions of the amino acids constituting the 
molecule [122]. A translocation coordinate should enable us to parametrize 
the walk of the protein along the pore. Since the evolution of each Q; is 
expected to undergo random fluctuations, we are interested in the probability, 
P(Qi,...,Qx,t), for the system to visit a state Q1,...,Qx at time t, given an 
initial condition P(Q1,...,Qx,0). Two main questions arise: i) What are the 
most representative reaction coordinates Q; in order to properly describe a 
translocation process? ii) What is the equation governing the evolution of P? 
In biopolymer translocation, a set of reaction coordinates that we can con- 
sider appropriate should smoothly trace without ambiguity the pathways be- 
tween the two states (cis and trans) with respect to the pore ends. The com- 
mon cylindrical symmetry of the narrow pores suggests to project the chain dy- 
namics along the channel axis (x-axis without loss of generality) and consider 
nugatory the transversal motion (y, z). Hence, the kinetics can be reasonably 
parametrized by a single reaction coordinate Q, expected to be a function of the 
x-coordinates of the residues only. The choice of Q is not a trivial task, there is 
neither a unique option nor a general rule, and usually an appropriate choice de- 
pends on the specific problem. For instance, in the case of a small protein pulled 
through a semi-infinite [54] or long enough [55] pore, the natural reaction coor- 
dinate can be either the x-position of the pulled residue or the x-component of 
the center of mass. Other authors found it reasonable to introduce, as reaction 
coordinate, the number of monomers in a given region of the space [74, 79, 123]. 
The derivation of an evolution equation for P, starting from the equations 
of motion of the full system, is generally problematic. Moreover, in general, the 
dynamics of Q in time is a stochastic process. A simplified approach is based 
on the assumption that the evolution of the collective variable is the slowest 
in the system, with a strong separation from other time scales, supposed 
irrelevant. Translocation is then considered completed when the stochastic 
process Q(t) crosses for the first time a threshold value Qin, corresponding to 
the protein exiting the pore from the trans-side. This point of view clearly de- 
fines a first-passage problem in a time window Tw, with exit time tout defined by 


tout = min{0 <isTy | Q(t) = Qu} 


In this framework, the translocation phenomenology can be conveniently recast 
in the motion of an effective particle at place Q, undergoing a driven diffusion 
in a potential of mean force V(Q) [57]. Here, V(Q,t) = G(Q) + W(Q) and 
G(Q) is the free-energy profile in the variable Q of the unperturbed system, 
while W (Q) is the work done by the importing force. In a first approximation, 
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we can presume that the probability P(Q, t) obeys the Smoluchowski equation 


OP OJ 
mag CEN 
= „dV(Q) dP 
J(Q,t) = -uP 10 Dozo’ (4.2) 


where po is the effective mobility coefficient and Do the effective diffusion one. 
At equilibrium, they satisfy the fluctuation-dissipation relation: uo = Do/(kpT). 
In (4.1), the contributions of the fast degrees of freedom are collected in a sin- 
gle diffusion term 0(Dj)0P/0Q)/O0Q. The Smoluchowski equation is satisfied 
by the probability to find the reaction coordinate at place Q, at time t. Such an 
equation pertains, thus, to a population of random walkers undergoing an over- 
damped Langevin driven diffusion in the potential of the mean force V(Q). As 
far as the results stemming from this approach are concerned (sec. 4), (4.1) has 
not been solved directly, rather we have found it more convenient to solve sev- 
eral 1D Langevin runs from (4.3), an approach that results equivalent to solving 
the Smoluchowski equation. 


V(Q(2)) = G(Q(2)) +W(Q(z)) 


In stochastic-process first-passage theory (see [57]), the basic quantity is the sur- 
vival probability S(t), given by 


(4.3) 


Qin 
S(t) = i. “dQ P(Q.t), (4.4) 


that estimates the number of systems for which Q is still in the interval [Qo, Qin | 
as a function of time, corresponding to the protein still in the channel. Here Qo 
is the value of Q at the pore entrance. Hence, 1- S(t) amounts to the probability 
to be outside the pore at time t. Therefore its derivative 


_dS(t) = qo OP (Qt) 


ar i at 


determines the distribution y(t) of the exit times. Thus, by using the Smolu- 
chowski (4.1), we get: 


W(t) = J(Qtn, t) - J(Qo,t), (4.5) 


namely the exit-time distribution is fully determined by the values of the 
probability current J(Q,t) at the boundaries (pore ends). Common choices 
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for a generic boundary condition Qg can be the following ones: absorbing 
P(Qp,t) = 0, reflecting J(Qp,t) = 0 or mixed J(Qp,t) = Kp P(Qp,t). The 
last one seems to be more realistic for describing translocation processes, as 
real pore ends cannot be considered neither perfectly reflecting nor perfectly 
absorbing. The solution P(Q,t) to (4.1) cannot be found in general for an 
arbitrary potential. However, even without solving the Smoluchowski equation, 
the exact analytic expression can be found for the average translocation time 
ty(F,L) and probability Pp.(F,L) as a function of the importing force 
F and the pore length L [55, 76, 77]. The theoretical prediction of 7(t) 
requires instead a complete closed solution. Therefore, only simple cases can be 
worked out or approximated expressions can be obtained [55, 75, 77, 114] and, 
in specific conditions, we can reproduce both simulation and experimental 
data. For instance, the inverse Gaussian function (3.12) is the first-passage 
time distribution of a diffusion process in the interval [-00, L], starting from 
x = 0, with absorbing boundaries in -o0 and L, and driven by a constant drift 
oF [57]. Despite its simple origin, the approach is able to match observed 
translocation time distributions under specific conditions [38]. 

In [55] the phenomenology of the transport appears more complicated than 
a simple driven diffusion, but the mild shape of the free-energy (Fig. 2.6) permits 
a simple interpretation of the translocation kinetics in terms ofa single jump over 
an extended barrier followed by a diffusion driven by a constant force. The the- 
oretical predictions in [55] were corroborated by translocation time histograms 
(Fig. 4.1) which are well fitted by first-passage-time distributions y(t), obtained 
by numerically solving the Smoluchowski equation (4.1) in the spatial coordinate 
xin [0, L]. Only a constant-force drift has been used in [55] under mixed bound- 
ary conditions, namely J(0,t) = —KoP(0,t), J(L,t) = KyP(L,t), where Ko 
and Kz have been considered adjustable parameters. 

It appears then necessary to calculate the free energy profile of the whole 
process to take into account the barriers that hamper the crossing. A quite 
common approach for similar issues refers to the umbrella sampling method 
[95], a computational technique used to improve sampling of systems which 
phase space is dived into subdomains separated by high energy jumps. The 
umbrella sampling technique is based on the choice of a fictitious potential to 
overcome the energy constraints so to explore even the most unlikely areas of 
the phase space. The thermodynamics properties calculated in this way need to 
be deweighted from the influence of the artificial potential and recombined in 
order to provide the unbiased probability (free-energy). This purpose can be 
achieved, for example, by employing the weighted histogram analysis method 
(WHAM) [96, 124]. 


64 


Marco Bacci 


9e-05 


P(t) 


3e-05+ 


Figure 4.1: Adapted from [55]. Probability distribution of translocation times for Ubiq- 
uitin translocation ina L = 300A pore [55]. The dashed line is an inverse Gaussian 
fit (3.12) of the histogram while the continuous line is the results obtained solving (4.1) in 
the domain [0, L] with V (x) from Fig. 2.6, constant pulling force and radiative boundary 
conditions. 


3. Models and methods 


For the analyses presented in this chapter, an additional protein, the so- 
called Endoglucanase Cel5a from bacillus Agaradherans [125] (PDB-ID:1a3h) 
is considered in addition to a mutated version of the MBP. The latter is called 
MalE262 as it is mutated on residues 263 and 264. The mutation is obtained by 
simply quenching the 14 native contacts belonging to the issued beads. Ther- 
malization and translocation simulations are performed for both MalE262 and 
1A3H. The former ones are addressed to establish the stability of protein struc- 
tures, the latter ones aim to further study the transport through the model pore 
described in section 2.2, which is tailored to allow single-file translocation only. 
The translocation simulation protocol has been already described in the previ- 
ous chapter and it is adopted for both MalE262 and 1A3H transport runs. Sim- 
ulations are performed at critical force Fe, i.e. the load intensity for which the 
translocation probability is 50%. This condition is believed the optimal com- 
promise between statistical accuracy and the slow translocation speed needed 
to both amplify the stall duration and possibly obtain a quasi-equilibrium pro- 
cess. Results are not presented so much in depth as for the MBP because they do 
not add crucial information. However, features such as the translocation stalls 
will be shown along with the exposition of the 1D-continuum approach results. 
Such results stem from the achievement of the free energy profile in a meaningful 
reaction coordinate Q by umbrella sampling simulations and WHAM method 
[96]. Also, the free energy profile G(Q) is used in the formulation of the po- 
tential of the mean force to perform a set of one-dimensional Langevin runs in 
order to asses the suitability of this approach to interpret, retain and summarize 
the essential features of the 3D translocation dynamics, the one in (4.3). 
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The umbrella potential Vin» reads 


Vamo(Q(x)) = Zhu (Q(x) ~ Qo)”. (4.6) 


The translocation process is parametrized in terms of Q = N,ignt — Nie ft, with 
Nrignt and Nieft the number of residues outside the pore on its right and left 
side respectively (Fig. 4.2A). Translocation is thought to proceed from left to 
right when the pulled terminus is the C-terminus and from right to left for N- 
pulled translocations. In other words, the cis-side of the pore coincides with the 
left side in the former case and with the right side in the latter one (the trans- 
side accordingly). This nomenclature is used consistently throughout, with C- 
pulling and N-pulling experiments proceeding, for example for the MBP, from 
Q = -370 to Q = 370 and from Q = 370 to Q = -370, respectively (as already 
stated, the MBP counts 370 residues, i.e. Q € [-370 : 370]). The associated color 
code used in the figures is light gray for N-pulling and dark gray for C-pulling. 
The collective variable Q can be expressed as a function of the x-axis coordinates 
of the residues in the following way: 


12 


Q(x) = LI [tanh(ax;)+tanh[a(x;-L))], (4.7) 


N i 
i 


with m the number of residues of the protein, L the pore length anda = 3.0A-! a 
smoothing parameter. Equation (4.7) is a smooth version of the discrete variable 
Q = Nright- Nieft. In order to compute the free energy profile, a set of umbrella 
sampling runs with different equilibrium values Qo is performed. In particular, 
the interval between different Qo is 4, hence, in the case of the MBP, 184 dif- 
ferent runs are used to reconstruct the profile Qo = -368, -364,..., 364, 368. 
We find it convenient to assign different values to the elastic constant ka of the 
harmonic umbrella potential depending on Qo, in order to better control simu- 
lation outcomes. In particular, such a shrewdness allow us to explore the whole 
region of the phase space associated with the reaction coordinate, maintaining at 
the same time a good overlap between adjacent histograms and a limited amount 
of sampling windows. For the sake of completeness, ku € [0.22 : 2] for any Qo, 
where the greater values are used for the higher values of |()o|. Indeed, higher 
values of Qo correspond to the regions of the phase space where it is easier to 
lose the protein in the bulk even for very small fluctuations of the reaction coor- 
dinate from the equilibrium value, i.e. the regions in which the molecule is close 
to the pore mouths. Initial configurations for the umbrella sampling runs are 
borrowed from the configurations explored during translocation simulations by 
selecting the ones closer to the Qo-value employed in the issued sampling. Each 
simulation runs for a time suited to collect proper uncorrelated statistics (1000 
points for each histogram with a correlation time equal to 150 internal units; 
the latter value was determined by preliminary simulations). Moreover, the first 
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10* time units (equal to 10% of the translocation simulation time window) are 
not used for histogram collection, in order to allow the initial configuration to 
lose memory of the transport process and to foster the possible refolding at the 
trans-side. Multiple weighted histogram analysis method is applied to deweight 
the data arising from the procedure in order to obtain the free energy profile 
G(Q). Further details on specific simulation protocols and employed quanti- 
ties, such as the contact backward density, are given in section 4. 


4. Results 


In this section we show that the translocation of proteins across narrow 
pores is generally characterized by the occurrence of long stall events, which 
main features are essentially retained in a 1D description of the process involv- 
ing the evolution of a proper collective variable, the one in equation (4.7), in the 
associated free energy landscape. This 1D approach is particularly appropriate 
when the translocation proceeds slowly, that is when the critical driving force 
is low enough. Indeed, the free energy profile is just a collection of thermody- 
namics states along the translocation pathway, which might not be in accordance 
with the non-equilibrium configurations explored during the transport. 

In particular it is pointed out that both the ramps along the free energy 
trends and the translocation bottlenecks result to be strongly associated with 
the structural properties of the folded proteins. In fact, they correspond sys- 
tematically to the regions along the backbone chain with a dense distribution of 
long-range native contacts with the untranslocated portion of the protein, the so- 
called backward contacts. These contacts, as we shall see, constitute a refinement 
of what already introduced in section 5. In other words, the backward contact 
density along the protein structure reflects into a step-like structure of the free 
energy, besides determining the stalls of the transport. 

The 1D Smoluchowski/Langevin approach retains the essence of the translo- 
cation phenomenon, although some unavoidable limitations, due to the simpli- 
fications introduced by the 1D modeling, can arise, especially if the slip transport 
phases in between successive stalls in the 3D molecular dynamics (MD) simula- 
tions are very fast due to a high critical force needed to perform the passage. 

The overall emerging picture suggests to decompose the free energy land- 
scape into its entropic and energetic terms. This approach confirms the impor- 
tance of the regions with a dense distribution of long range interactions, which 
govern both internal energy and entropy trends. In fact, once such bonds are 
broken, we recognize not only an increment in the average value of the internal 
energy, but also a similar jump in the entropic term. The latter one indicates that 
the unfolding associated with contact severance prevails upon the entropy re- 
duction due to the pore confinement. Such a behavior is different from the one 
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relevant to an unfolded protein, as the latter shows a decreasing (or stationary) 
entropic contribution trend. 

Numerical simulations involving different proteins support the conjecture 
that these results are not limited to the MBP specific case, but are instead broad. 
For the sake of clarity, the approach is first presented referring to the MBP. Suc- 
cessively, the generality of the results is discussed by addressing the specific MBP 
mutant MalE262 and the globular protein 1A3H, which has been selected for its 
substantially different native topology. 


Nieft Npore Nright 
INA CRA, 
menù = rr <a go 


er. 40: 
Rye Q= Nright - Nieft 


N-pull OG 


Figure 4.2: Here and in the following figures units are internal, i.e. related to the coarse- 
grained model we used. Distances, when present, are in A. Panel A: Niet, Nright and 
Npore are the number of monomers on the left side, right side and inside the pore. It 
is always conventionally assumed that the first monomer entering the left side is the C- 
terminus (dark gray) while the translocation from the right side are pulled from the N- 
terminus (light gray). Hence the protein translocates left-to-right or right-to-left for C- 
and N-pulling, respectively. Panel B: Free-energy (G) profile from umbrella sampling 
simulations of MBP in the collective variable Q = Nrignt — Nieft. The curves represent 
the average residence time 7 (Q) for non-equilibrium MD simulations (filled) and for the 
1D approach based on the Smoluchowski equation (4.1) (dashed). For N-pull the 7,(Q) 
scale is reversed (the values being reported in parentheses and in light gray on the right 
y-axes) while Tẹ (Q) stemming from the Smoluchowski equation are rescaled in order to 
match the background values. 


The average residence time 7,(Q) that the MBP (R, = 7.5À in the present 
investigation) spends in the configurations associated with Q during the translo- 
cation (Fig. 4.2B), clearly indicates that the transport is not uniform. Indeed, 
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most time is spent in the same specific states found in chapter 3 for Re = 6.8A 
(events C-stl and C-st2 for C-pulling and N-stl, N-st2, N-st3 for N-pulling). In 
Figure 4.2B the free-energy profile G(Q) is also reported: the barriers of G(Q) 
correspond to the stalling events. To obtain the profile we employ the procedure 
described in section 3 by using as initial configurations both the conformations 
relative to the C and N-pulling translocation runs. In fact, from the MBP-G(Q) 
profile obtained by using as initial configurations only those from C-pull runs 
and reported in dark gray in the upper left panel of Figure 4.3, it is clear that the 
average ascending trend in the region Q < 0 does not correspond to an anal- 
ogous average descending trend for Q > 0. Since outside the pore (Q = -370 
and Q = 370) the thermodynamics conditions are the same for both sides of 
the channel, the relative free energy values should be equal. Such a prediction is 
not reproduced by a profile achieved by using only initial configurations which 
belong to either C or N-terminus translocation runs (single profile). This is pre- 
sumably due to the long time needed for the refolding process at the trans side 
to take place. Indeed, the same behavior is confirmed for the MBP N-pull runs 
(light gray curve in the middle panel of Fig. 4.3 left side): for Q > 0 (we conven- 
tionally consider the N-terminus pulled translocations to proceed from right to 
left) the ascending trend overwhelms the decreasing one observed when Q < 0. 
In order to obtain a single representative G(Q) profile for the entire process, half 


Figure 4.3: Free energy profiles G(Q) obtained from umbrella sampling runs employing 
initial configuration taken only from C-pull (dark gray) or N-pull (light gray) transloca- 
tion runs. The black line in the lower panel is the profile obtained using the two previous 
profiles half by half. The left panels refer to MBP and the right ones to 1A3H. 


of the umbrella sampling runs obtained from the C-terminus pulling initial con- 
ditions (Qo < 0) are combined with the complementary simulations relevant to 
the N-terminus case (Qo > 0). The resulting profile is plotted in the lower panel 
in Figure 4.3 and it is the one reported in Figure 4.2B. The right panels of the 
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former figure refer to 1A3H, for which the same approach holds. As a further 
indication that the anomalies of the single profiles are associated with the time 
needed for the refolding process, we report in Figure 4.4 the free energy profile 
for an unfolded protein, Re = 3.0A obtained from umbrella sampling simula- 
tions exploiting just the C-pull run initial configurations. In this case G ~ 0 also 
for Q = 370. 


Figure 4.4: G(Q) profile for denatured MBP protein (Re = 3A) obtained from umbrella 
sampling employing initial configuration from C-pull translocation runs only. 


As common in translocation problems, the free energy trend is used in the 
practical implementation of the Smoluchowski equation in order to compare 
the non-equilibrium MD runs with the 1D continuum approach. In this per- 
spective, a suitable mapping of the importing force Fẹ employed in the MD 
runs on the effective force Fg = 0W/0Q used in the stochastic model is re- 
quired. A gross estimate for Fg can be obtained by assuming that, in the MD 
case, each position of the application point of the force along the x-axis corre- 
sponds to a unique value of the collective variable Q. In principle, from translo- 
cation experiments, it is possible to extract the conditional probability of ob- 
serving Q as a function of the force application point x, P (Q|r). It turns out 
that P(Q|x) is sharply peaked, thus allowing to use a (monotonically increas- 
ing) deterministic function to map the MD onto the 1D-approach forces by us- 
ing Fg = (0W/0x)(0x/9Q) = F,0x/0Q. Indeed the molecular dynamics 
data show that when the pore is full (Ne; + 0 and N,;jgnt # 0) the distri- 
bution of Npore is tight around its average value, Nyore ~ 38 (Fig. 4.5) up- 
per panels. Hence, a single monomer entering from the cis side corresponds 
to a single monomer exiting from the trans side (single-file motion). The net 
variation of Q, AQ = 2, is associated to a displacement of the force applica- 
tion point substantially given by Ax ~ L/Npore ~ 2.5. Thus it follows that 
Fg ~ FyAx/AQ = 1.25F,. Instead, when either Nieft = 0 or Nrignt = 0, we 
get AQ ~ Land Fg = F,0x/0Q = 2.5F 5. 

The dashed lines in Figure 4.2B represent the average residence time m (Q) 
obtained by solving a set of ID Langevin runs governed by (4.3). It is clear that 
the peaks coincide with the stalls observed in the MD runs, thus confirming the 
accuracy of the 1D driven diffusion interpretation. 
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Figure 4.5: Upper left. Npore distribution for translocation runs (both C and N pulling) 
for the MBP. Upper right, same data for 1A3H. Lower Left. Scatter plot of the amino acid 
at the pore mouth on the cis-side as a function of the reaction coordinate Q during MBP 
translocation simulations (C-pull dark gray, N-pull light gray). The maps eqs. (4.8) and 
(4.9) are also depicted as straight dotted lines. Lower Right. Same data for the 1A3H 
translocations. 


It has just been shown that the Smoluchowski picture is appropriate to de- 
scribe the essence of the translocation process. Now we show also that the main 
features of both G(Q) and p(Q) are associated with specific properties of the 
molecule native structure. In particular, we observe that the stalling events are 
related to the entrance in the pore of the regions richer in long range contacts 
formed with the untranslocated portion of the protein. The main idea behind 
this approach is that only the interactions between the parts of the molecule that 
still dwell on the cis-side are effective in provoking the stalls. As a first attempt 
to quantify such a remark, we calculate for each amino acid the backward con- 
tact index B, i.e. the number of contacts that the issued amino acid forms with 
the still untranslocated residues. First of all it is necessary to establish a map be- 
tween Q and the amino acid at the pore mouth, in order to express directly the 
backward contact density as a function of Q, alike 7 and G. This view allows 
comparing the native contact clusters with the features of the free-energy pro- 
file, even without resorting to a dynamical analysis of the MD data. However, 
the average number of residues inside the pore, when it is full, is required to be, 
with a good approximation, a constant of the motion. In Figure 4.5 a scatter plot 
of Q versus the amino acid at the cis pore mouth, iczs , is also shown in the 
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lower panels. Such an amino acid is selected by running back upon the chain of 
the protein, starting from the foremost residue along the pore axis up to the first 
residue which x-coordinate falls in the range [-1 : 3]. Also, its distance from 
the pore axis has to be lower than the channel radius. For each Q the distribu- 
tion of the amino acids that are found at the pore mouth is very tight (the map is 
practically one to one and onto). The dashed lines show such a correspondence 
by assuming N pore = 38 and by using (4.8), or (4.9) to map Q and îcrs. The 
map for the C-pulling case is 


icrs € [1 : Npore] Q=icis-m, 


: 7 : s 4.8 
CIS € (Noore : m] Q = 2icrs = Npore =m, ( ) 

while the N-pulling map reads as 
icrs €[m:m- Npore] Q=-icrs, (4.9) 


icrs € (M- Npore : 1] Q =m - Npore — 2icrs. 


Such a correspondence is mainly used to map the backward contact index rel- 
ative to the amino acid at the pore mouth onto the reaction coordinate Q, by 
assuming that Nigra is a constant of the translocation dynamics. 

For a protein pulled from the N-terminus (first residue), the backward con- 
tact index for the i-th amino acid reads 


By = > Mi, (4.10) 


j=i+b 


where M;; is the contact matrix (M;; = 1 if residues è and j are in native con- 
tact and 0 elsewhere) and b; a threshold introduced in order to consider only the 
native contacts formed with the residues that are relatively far in the sequence. 
Here we use b; = 20, a distal distance that excludes, for instance, the interac- 
tions responsible for helix structures. Indeed, such structures do not play a con- 
siderable réle since helices are not required to completely unfold to enter the 
pore. A smoothing procedure (running average) is then performed in the inter- 
val [i- 5: é+ 5]. The resulting function is indicated as BY), Such a procedure 
is a refinement of what already illustrated in section 5 as, on one hand, only the 
backward distal contacts are considered and, on the other hand, no dynamical 
data are required aside from a preliminary evaluation of Npore. An analogous 
expression holds for C-pulling: 


i—by 
BOO = Y Mg. (4.11) 
j=l 


Of course, when the amino acid 7 is supposed to be at the pore mouth, we get 
i = icrs and the maps (4.8) or (4.9) between ic7g and Q hold. In Figure 4.6 
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the smoothed version B of the backward contact index B is reported for C and 
N-pull from just a static analysis of the MBP contact map. It is noticeable that 
the ascending ramps in the profile correspond with the peaks in the backward 
contact density B. 

In order to confirm this picture, we remove selectively the native contacts 
own of the amino acids 263 and 264 (two residues mainly involved in the stall 
C-st2) and repeat both the non-equilibrium MD translocations and the entire 
umbrella sampling protocol. The result is that the residence time peak C-st2 is 
smaller (Fig.4.7) and, correspondingly, the G(Q) slope in that region is lowered 
(Fig. 4.6). 


etter nny 


Figure 4.6: Burial backward B for C (dark gray) and N-pull (light gray). The solid line is 
the free energy profile of the MBP (already reported in Fig. 4.2B) while the dashed one is 
the profile of the MBP-mutant, for which the native contacts of amino acids 263 and 264 
are removed. 
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Figure 4.7: Comparison of residence time m, as a function of the collective variable Q for 
wild-type MBP (dashed lines) and MalE262 (solid lines) for both C (dark gray) and N 
(light gray) pulling. 


To understand if the triple correspondence between average residence time 
T»(Q) from non-equilibrium runs, free energy profile G(Q) and protein struc- 
tural features (i.e. the backward contact density B) is specific of the MBP or is a 
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more widespread property of protein-like structures, we repeat the analysis for 
a different globular molecule (1A3H) characterized by a substantially different 
native topology with respect to the MBP (see Fig. 4.8 for a comparison between 
contact maps; R, = 7.5À for the MBP and R, = 7.2À for 1A3H). In particular, 
native contacts are more uniformly distributed for the MBP and are mainly as- 
sociated with either distal clusters or -sheet interactions. In contrast, for the 
1A3H molecule, Lennard-Jones potentials concern primarily adjacent a-helices, 
in addition to a small set of clusters that the region close to the N-terminus forms 
with the rest of the structure. In the bottom panel of Figure 4.8 the quenched 
contacts in MalE262 are highlighted in dark gray. The protein 1A3H is formed 
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Figure 4.8: Native contacts are reported in transparent light gray circles. The dark gray 
circles in the bottom panel correspond to the fourteen contacts removed in order to obtain 
the mutated Maltose Binding Protein, MalE262. 


by 300 residues (hence Q € [-300, 300]). In Figure 4.9, G, 7 and B for both C 
(dark gray) and N-pull (light gray) are reported as a function of Q. The dynam- 
ics, similarly to the MBP, is characterized by several stalls. For both N-pull and 
C-pull it appears that the peaks in (Q) coincide with the ones in the corre- 
sponding B trend. Moreover, the larger clusters of native contacts are associated 
with the barriers in G(Q). 

Here the Smoluchowski picture (dashed lines) is appropriate to summarize 
the 3D phenomenology, mainly for the C-pulling case, as the N-pull is charac- 
terized by a very high value of the critical force. Such a force probably reduces 
the suitability of the 1D approach in the equilibrium potential dictated by the 
free energy profile, at least from a quantitative point of view. In Table 4.1 critical 
forces F, are summarized for the translocation simulations performed. For the 
mutated MBP the critical force can be assumed to be equal to the one pertinent 
to the wild type structure. 
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Figure 4.9: 1A3H. The filled curves in the main panel represent the average residence time 
ty from 3D MD translocation runs, while the dashed lines represent the outcome of the 
1D Smoluchowski approach. The solid line is the free energy profile G(Q) from umbrella 
sampling simulations. The upper and lower panels report the backward contact index B 
for C (dark gray) and N-pull (light gray). 


Protein, Pull | Fe 
MBP C-pull | 2.1 
MBP N-pull | 2.9 
1A3H C-pull | 1.9 
1A3H N-pull | 3.6 


Table 4.1: Critical forces for the different MD translocation runs performed. The critical 
force of the mutated MBP coincides, within the statistical error, with the critical force of 
the corresponding wild type structure. 


Finally, we analyze the different contributions to G(Q) in order to under- 
stand what are the dominant ones for the observed phenomenology. In Fig- 
ure 4.10A, the total internal energy Vas, the free energy G and the entropic 
contribution T'S are reported for the wild type MBP as a function of Q. The 
energetic contributions are calculated as conditional average by using the con- 
figurations from the umbrella sampling runs. The entropic term is TS(Q) = 
Vas(Q) - G(Q). All contributions are conventionally set to zero for Q = -370 
(protein on the left side of the pore). During the transport Vga increases since 
the protein needs to unfold to translocate and, consequently, native contacts are 
stressed up to rupture. A similar behavior is found for the entropy. Indeed the 
break-up of native contacts results in a less compact structure and in a disorder 
increment, up to a maximum corresponding to the protein straddling the pore 
symmetrically (the unfolding associated with contact detachment overwhelms 
the confinement effect of the channel). Entropic and energetic contributions are 
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closely correlated one another and, as usual, with the clusters of native contacts. 
Also, the entropy profile differs from the one pertinent to an unstructured poly- 
mer, which is provided in Figure 4.11 upper panel (MBP R. = 3.0À). As it is 
expected, the entropic term shows just a barrier at the beginning of the translo- 
cation, i.e. up to the exit of the first residue on the trans side, after which the 
landscape is practically flat. Panel B of Figure 4.10 reports the main contributions 
to Vga, namely Vnat and Vy. The dashed line V,*,, marks a heuristic estimation 
for Vat, obtained by integrating the balanced version of the backward contact 
density subsequently described. 


Figure 4.10: Upper: Energetic (dark gray circles) and entropic (light gray squares) contri- 
butions to the free energy profile G(Q) for wild type MBP. Lower: Main contributions to 
Vas: Vnat (light gray diamonds) and Vy (dark gray triangles). The dashed line represent 
the heuristic estimation of Vi: based on the backward contact density. 


For the sake of clarity, a translocation pulled from the C-terminus is ex- 
emplified. A quantitatively identical result can be obtained by considering the 
N-pull translocation. As first step we calculate the quantity 


BIN (Q) = BO (Qa) - BAU + Npore)); (4.12) 
where the term BON) (Q(i+ Npore)) represents the contribution to the internal 


energy due to the refolding on the trans side. In fact, when residue 7 dwells at 
the pore mouth on the cis-side, residue = è + Npore dwells close by the trans-side 
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pore entrance. Integration of (4.12) provides an estimation for the net number of 
broken contacts during the translocation. Since each contact amounts for an en- 
ergy jump equal to 1, a rough estimation of the native contribution to the internal 
energy can be calculated as 


Q ~ ays 
ViialQ) = f BO (aaa. (4.13) 


The approach is a further confirmation that the essence of the translocation is 
hidden in the structural properties of the native configurations, which are in 
turn mainly determined by the long range native interactions. 

To conclude, we mention that analogous results in terms of energetic and 
entropic contributions hold for the 1A3H protein (Fig. 4.11 lower panel), thus 
confirming the overall picture. 
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Figure 4.11: Energetic (dark gray circles) and entropic (light gray squares) contributions 
to the free energy profile G(Q) for unfolded MBP (Re = 3A upper panel) and for 1A3H 
(lower panel). 


5. Remarks 


In this chapter it has been show that the stalling events that characterize 
the dynamics of a protein-like structure in a single-file translocation are associ- 
ated to the structural properties of the native configuration. The general setting 
holds for all the cases analyzed. However, the stall sequence is specific for each 
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protein and constitutes a sort of signature of the molecule. The identification of 
the features responsible for the bottlenecks of the transport allow us to formu- 
late a heuristic procedure able to estimate the native bound contribution to the 
free energy profile, on the sole basis of the protein native structure and Npore, 
by simply integrating a balanced version of the backward contact index profile. 
These occurrences appear to be generic, not associated just to a specific globular 
protein, thus opening the way to systematic pre-screening of the proteome. 

Also, the intrinsic difference between folded and unfolded structures in 
terms of the entropic contribution relevant to the translocation process has 
been pointed out. In the former case the confinement effect due the pore 
appears to be overcome by the severance of native contacts, hence confirming 
the importance and leading role of the long range interactions in this kind of 
dynamics. On the other hand, for unstructured polymers, the free energy 
landscape is dominated by a single entropic barrier at the beginning of the 
transport. 

Finally, the one-dimensional interpretation of the process as a driven dif- 
fusion along a proper potential of the mean force provides overall good results. 
Indeed, it retains the essence of the 3D translocation dynamics like, for exam- 
ple, the bottleneck scenario. However, quantitative comparisons appear to be 
too ambitious, especially if the critical force pertinent to the transport is high, 
as for the 1A3H N-terminus pulling. In fact, in this case, once the first stall has 
been overcome, the translocation proceeds extremely fast and hence it is poorly 
described by a quasi-static approach. 
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Chapter 5 
3-Dimensional Continuum Models 


1. Preamble 


In the present chapter we propose a continuum model for the description 
of the dynamics of isolated macromolecules. We adopt a second-rank tensor as 
a descriptor of the macromolecular shape and identify the action governing its 
dynamics by means of an identification procedure from a discrete scheme, based 
on power equivalence and Cauchy-Born rule. We compare molecular dynam- 
ics stretching simulations with the continuum model by starting from discrete 
toy schemes, going on with increasing complexity, and ending with the analysis 
of the already encountered Ubiquitin protein. The results indicate limitations 
in the approach in case of unconstrained molecular dynamics while they show 
appropriateness for driven dynamics. 

In the second part of the chapter, simulations relevant to the MBP protein 
are also presented. In this case stretching and, above all, translocation dynamics 
are considered. The selected approach is able to capture the shape evolution of 
the molecule, providing that the pore is long enough to provoke a considerable 
stretching. However, limitations totally analogous to the Ubiquitin protein (and 
hence not shown for brevity) still hold for equilibration simulations. 

Such shortcomings are ovecome thanks to empirical formulations of the 
morphogical descriptor, or rather of the quantity that govers its dynamics, as 
we will see. 


2. Introduction 


Biological materials have in general intricate architecture at finer spatial 
scales. Multi-scale and multi-field views are often called upon to offer a setting 
for the description of their mechanical behavior. Even the analysis of processes 
involving a single macromolecule requires care. This is the case, for example, 
of protein stretching in free space (performed, for example, in atomic force 
microscope experiments). In analyzing such a process, both discrete and 
continuous views are adopted. Numerical simulations on discrete schemes 
(both full atoms and coarse-grained molecular dynamics) are developed to 
analyze stretching and transport (see, e.g., [54, 55, 123, 126, 127]), and to 
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determine free-energy profiles (see, e.g., [11, 55, 128, 129]) which are used often 
to obtain closed form solutions for first passage time statistics (see [57] for 
relevant theoretical issues). 

Models at continuum level are then used to summarize the results of numer- 
ical simulations, by reducing the dynamics of discrete material points to the one 
of a single descriptor of the molecule. The common view is that of using data to 
describe just the motion of the molecular mass center or of a similar collective 
variable leading to one-dimensional dynamics (see, e.g., [11, 55], and references 
therein). No information on molecular shape evolution in time are added. 

Here we propose a continuum model which captures, at least in a coarse 
way, the shape evolution of an isolated protein. So, we attribute to the molecule 
a (time-dependent) second-rank tensor v with positive determinant as a descrip- 
tor of its shape. v does not coincide with the inertia tensor. It is a linear operator 
that we define on the basis of Cauchy-Born rule. In fact, we imagine to fix a 
shape of the molecule that we take as a reference, and represent it by a discrete 
set of material points. v is then defined as the tensor such that the velocity of the 
i-th material point is given by the time rate of v applied to the i-th reference 
position vector. 

The evolution of v is governed by a balance of (microscopic) actions on the 
molecule. At continuum level we identify such actions in terms of the ones oc- 
curring between every material point and its first, second, and third neighbors 
in the discrete scheme, in addition to possible long-range interactions. The iden- 
tification is made in terms of power equivalence. The values of the actions in the 
discrete model are obtained by numerical simulations. Then the continuum evo- 
lution of v is depicted in terms of the eigenvalues of the symmetric component 
of its polar decomposition. 

We compare molecular dynamics stretching simulations with the contin- 
uum model. We consider first discrete toy schemes and go on with increas- 
ing complexity, then we end with the analysis of the Ubiquitin protein. The re- 
sults indicate limitations in the approach in case of free dynamics of the macro- 
molecule while they show appropriateness for constrained dynamics, allowing 
us to accept the model for a number of constrained motions such as protein me- 
chanical unfolding. 

Finally, MBP translocation is also taken into account (there is not substantial 
difference between Ubiquitin and MBP thermalization and mechanical stretch- 
ing results, hence MBP data are not reported for brevity, aside from just one re- 
mark in sec. 4.4). We shown that the appropriateness of the approcah still holds 
in MBP driven and confined dynamics. At the end of the chapter, we also point 
out that the limitations concerning free equilibration are overcome by selecting 
empirical formulations of the so-called self-action (a quantity that, in the present 
setting, governs the evolution of v). 
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3. Models and methods 
3.1. Discrete scheme 


The Ubiquitin protein in Figure 5.1 is described by the already introduced 
(sec. 2.1 of chapter 3) Ca backbone coarse-grained G6-like model (see [1, 100] 
for the original approach, [2] for the refined version implemented here and also 
[2, 55, 56, 114], for its use in describing protein stretching, thermalization and 
translocation). The scheme is also equivalent to what is presented in [55] where 
the same molecule (PDB-id:lubq) is considered. Values of the parameters ap- 
pearing in protein potentials and measure scales are reported also in [55]. If not 


Figure 5.1: From left to right: full-atom picture, secondary structure features and coarse- 
grained Ca carbon atoms along the backbone chain. 


differently specified, we assume also R, = 3.0A or Re = 6.5A as values of the 
Go model cut-off radius. The Go potentials (see sec. 2.1 of chapter 3) constitute 
a basis for modeling atomic interactions not only in the Ubiquitin structure but 
also in the other sets of atoms in space, as it will be pointed out in what follows 
when necessary. The same approach holds also for the MBP. 


3.2. Continuum model 


As anticipated in the introduction, at continuum level the dynamics of a 
single protein is commonly depicted just by evaluating the subsequent positions 
in time of its centre of mass (see e.g. [11]). A more refined natural descriptor of 
the molecular morphology is the moment of inertia of the atomic cluster. When 
we use it as descriptor of the molecule, as Ericksen suggested in [58], its time 
evolution involves the sum of the tensor moment of momentum of each single 
atom constituting it. 

We do not follow such a view, but we are inspired by it and select a second- 
rank tensor as a descriptor of the molecular shape, a tensor associated with an 
idea of homogeneous deformation of a region containing the molecule in a shape 


81 


Coarse-grained molecular dynamics and continuum models 


that we take as a reference (in this sense we are closer to what suggested in [59] 
for the mechanics of granular media). So, we select a compact region e in three- 
dimensional space R? - the volume of e is indicated by |e| - as a large enough 
portion of space containing the molecule in a shape that we take as reference 
(the equivalent sphere radius ranges in the interval [15,25] A, about one and a 
half to two times greater than the Ubiquitin gyration radius). Let r; be the vec- 
tor position of the i-th atom - or material point if we refer to a coarse discrete 
representation of the protein — in e in the reference place. We consider a second- 
rank tensor v with positive determinant, which depends on time t, such that at 
the instant t the velocity v; of the i-th material point in the discrete scheme is 
simply given by wr; - essentially we are imagining that e is endowed with an 
homogeneous strain rate and the atoms of the molecules are dragged along the 
process by the homogeneous deformation. 

The assumption has to be interpreted ‘in average. It has clear limitations 
because the unfolding of a protein can develop in a non-homogeneous way by 
involving subsequent portions. However, the assumption 


Vj = UTi, (5.1) 


which is motivated by Cauchy-Born rule’, allows us to identify clearly the action 
of the protein over itself. 

We presume also that v be endowed with positive determinant, so that, by 
polar decomposition theorem, we can write 


v = RU, (5.2) 


where R e SO (3) and U is a second-rank symmetric tensor. 

In the discrete representation of the macromolecule, as described previously, 
call f;; the action (a force, a covector indeed, that we think derived from a poten- 
tial, the one introduced previously) between the material points 7 and j, applied 
to the point i. The density of power Tdiser developed in the discrete scheme is 


then given by 
DERZ 


i+jee 


Tdiser *= 


1 
In the continuum representation, the rate v has a conjugated action which is de- 
fined by the power 7. which is necessary to develop over the molecule, consid- 


ered as a whole, to get the rate ù itself. Since ù is a tensor, the power-conjugated 
action is a tensor too — call it Zs — so that 7, is defined by 


To t= ZaD. 


1See [130] and [131] for extended analyses of such a rule. 
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Moreover, since Z summarizes the actions in the protein (it is a self-action, in- 
deed), and we are not considering for the moment other actions over it, interac- 
tion with the external environment being assigned, in this case, by applied strain 
to the ends of the protein, z, should satisfy the balance equation 


z,=0. (5.3) 


However, in principle Zs could be imagined as the sum of a conservative part, 
zs, and a dissipative one, 2%, dissipation being developed during the evolution 
of the molecule - so we should have Zs = zs + 2°. The dissipative nature of 2° 
would be then expressed by the inequality 


which should hold for any choice of ù, implying so an expression for z% of the 


type 
zł = av’, (5.4) 


with a a positive constant and ù’ the dual version of ù. Precisely, with e1, e2, €3 a 
vector basis in the three-dimensional ambient space, and et, e”, e3 its dual coun- 
terpart, we have ù = ù" e; ®e; and ù’ = ui e Qef. The reason of the occurrence 
of Ù’ in (5.4) is that this way the product aù’ - è = aù} ù" is naturally defined 
without the additional structure of the scalar product. However, in what follows 
we shall not make difference between covariant and contravariant components 
because we shall always tacitly use the identification of R? with its dual. As a 


consequence the balance (5.3) would reduce to the evolution equation 
aÙ = — 2g. 


In particular, for the sake of simplicity we shall assume later that a = 1, so that 
the evolution equation that we shall consider will be 


p==i (5.5) 


If so, the power me would result the sum 
ME. 
Me = 2% dt ole ; 


The problem is now the explicit expression of zs in terms of the discrete actions 
fiz. To get it, we impose here the identification of naiser With zs - ù: the power 
of the (potential dependent) actions in the discrete scheme should be the one of 
the conservative part of the self-action. We then write 


>) figo 


isjee 


1 


Zs’ Ď = 
[el 
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Then, thanks to the arbitrariness of ù, by using (5.1) we finally get 


= C > fij Bri. (5.6) 


| lagged 


In presence of (conservative) external forces on single material points of the dis- 
crete representation of the molecule, the previous expression of z, should be 
augmented by the tensor product of these forces by the position vectors of the 
material points where they are applied. So, if fx is the (conservative) force ap- 
plied to the k-th material point, with k = 1, 2, ..., q, with q the number of atoms 
where external action act, we then have” 


Zs -alz fons Y ion) 


izjee 


Since we are interested in the changes in molecular shape, in the numerical anal- 
yses presented later we shall compute essentially the eigenvalues A of the sym- 
metric tensor U appearing in the polar decomposition of v, and determining the 
semi-axes of an ellipsoid in three-dimensional space. 

Finally, in what follows, the placement vectors r; might be referred to dif- 
ferent frames. However, in the case of Ubiquitin and Maltose Binding proteins, 
only placement vectors with respect to the center of mass of the molecules will 
be considered. Obviously, explicit coordinates will depend on the frame selected 
to evaluate both the center of mass and bead positions. 


3.3. Stretching simulations: methods 


We summarize here the methods that we use in developing later numerical 
simulations. 

Stretching simulations are performed by means of a scheme based on con- 
stant velocity steered molecular dynamics (see [117]). Initial conditions are ob- 
tained by the so-called ‘free equilibrium. At the beginning of each numerical test 
the tensor v is set to be equal to the identity. Protein detachment is achieved by 
a spring of elastic constant k = 1e, linking a fictitious atom with a terminus of 
the molecule. The fictitious atom is pulled at constant velocity. Its motion drags 
the terminus where it is bonded. We perform two different stretching protocols. 


1. In the first scheme one end is held fixed in space and only the other one is 
pulled away. 


2. In the second approach both ends of the molecule are dragged with oppo- 
site velocities. 


2For an enlarged view on this identification procedure in the setting of mechanics of complex 
materials, see [132]. 
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The stretching direction coincides with the vector joining the two ends of the 
molecule in the initial configuration. We investigate different stretching veloc- 
ities Vstr, temperatures T and cut-off radii Re. When maximum elongation is 
reached (the molecule is in an almost single-file conformation), the protein can 
be either kept stretched or left free to fluctuate. 

For the other smaller sets of atoms, either the fictitious atom external spring 
(the one connecting the fictitious material point to the molecule) or the direct 
enforcement of a suited constant velocity motion to some points of the system 
have been used as strategies to obtain a deformation dynamics. 


3.4. Translocation simulations: methods 


As far as MBP translocation simulations are concerned, nothing changes 
from what described in sec. 3.1 but the pore length, which ranges in the interval 
[100 : 800].A, depending on the simulation perfomed. 


4. Results 
4.1. Toy simulations and related remarks 


Before looking at the dynamics of the entire Ubiquitin molecule at 
discrete scale and within the continuum approximation proposed, we analyze 
some simpler schemes - toys, indeed - to investigate the limits of the 
discrete-to-continuum identification that we have adopted here. 


1D biatomic systems. First we consider the evolution of v for a trivial har- 
monic oscillator, composed by two material points, indicated by A and B, and 
connected by a spring of elastic constant kp. In this case v, defined previously, 
reduces to a scalar. The scheme, sketched in panel a of Figure 5.2, allows us 
to make some elementary predictions that are in agreement with the numerical 
analyses. The simulation temperature is set equal to zero and constant velocities 
are assigned to the two atoms. 

Let ro be half of the inter-atomic distance when the spring between points 
A and Bisat rest (in other words, in a configuration that we set as a reference), 
that is r4(t = 0) = -ro and the same is for B. 

When two equal and opposite velocities +vo(t) are assigned to points A and 
B, for the motion we get 


ra(t)=-ro- foe - to)dr, rp(t)=-ra(t). 
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Figure 5.2: Two-atoms system. Here and in all the other figures lengths are expressed in A 
and the time in the relevant ‘code unit’ Panel a: Sketch of the two-atoms mass-spring sys- 
tem. Panel b: Evolution of v, a scalar in this case, for a standard elongation/compression 
cycle, run twice. The asymmetric evolution of v would generate an almost-monotonic 
trend if the cycle was performed over and over again. The evolution of v is here com- 
pared with the radius of gyration rgir of the set of atoms. Panel c: It is the same of panel b 
except that here the specular evolution of v is achieved because internal distances re ap- 
pearing in the expression of zs are kept constant to the reference values during the whole 
dynamics. We note a delay in the evolution of v with respect to the trend of the radius of 
gyration as the former keeps on increasing (or decreasing) during the approaching phases 
(i.e. when the system is moving toward the rest position from maximum detachment or 
compression). Also, v never shrinks further than its initial state. Panel d: Effect of the 
velocity on the evolution of v. Despite an overall equivalent deformation testified by the 
evolution of the gyration radii in the inset, the amplitude of the trends of the ellipsoid is 
different if the velocity is changed, i.e. it is dependent on the rate of deformation. 


By indicating with f4g the force that point A exerts on point B, we can write 


fap =—kn(re(t)-ra(t)-2ro), fea =—fas- 


We develop an elongation cycle in which points A and B are pulled far apart by 
imposing opposite constant velocities +vo to A and B. The initial position with 
the atoms at rest is obviously regained by reversing the velocities for the same 
amount of time spent in the elongation phase, i.e. te = ta, where the index e 
indicates the elongation time and a the time spent in the re-approaching process. 
The analysis of the previous equations is trivial and the solution can be plugged 
into the definition of zs (where for simplicity we take |e| = 1). 
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Assume to = 0. From the coarse grained balance of overall actions in the 
continuum scheme, we find two expressions for zs that we label with the apex 
e in the elongation phase, and with a when the two particles re-approach one 
another by reverse velocity. We get 


25 = —4kn(rovot + vat?) 5 a. = -2kn (vot = 2ro)?. 


Time integration of (5.5) along an extension-compression cycle starting from an 
initial value vo then leads to 


te 
Ve = Vo + 1 4knvo(roT + vot? )dr 
0 


t 1 10 
= Vo + Akn (voro + 3 vote) = 3 Antero 


ta 
Va = Ve + i 2kp,(voT — 2ro) dT > Ve. 
te 


The last inequality arises for 2° is negative in the whole approaching process. So, 
we have a drawback: the evolution of v is monotonic although in the approach- 
ing phase the two atoms are getting closer toward the initial configuration. 
Also, if we we start compressing, the effect on v is in magnitude different from 
the extension dynamics. Indeed the expression of zs for such a case is 


Zs = 2kprovot — 2kuft 


The two terms on the right hand side are opposite in sign: the evolution of v 
results smoother and slower than in the elongation phase. In other words, a 
symmetric deformation process does not generate a symmetric evolution of v. 
First v does not recover the initial condition once the two atoms are back to the 
rest position. It also does not regain the starting value even when a compression 
follows the initial elongation, as clearly depicted in panel b in Figure 5.2. In other 
words, v never ‘shrinks’ below the initial value. The evolution process quickly 
degenerates in an almost monotonic trend, ifthe cycle elongation-contraction is 
repeated several times. Moreover, if opposite sinusoidal motions are imposed to 
points A and B, in place of the constant velocity drifts, a complete monotonic 
evolution of v arises, as it results from the subsequent elementary analysis. 

In the sinusoidal case, the harmonic displacements (constants do not have 
here a prominent ròle) are given by 


rg(t)=ro+Dsin(D'(t-to)), ra(t) =-ra(t) 
The forces exerted between the points are then 


fap(t) = -ka(ra(t) -ra(t) - 270) = -2karp(t) 
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and 
fea(t) = -fap(t). 


Zs is given by 
z(t) = faz(t)rs(t) + fea(t)ra(t) = 2faB(t)rgB(t) x -sin (t). 


By integrating in time (5.5), we find that v diverges, namely 


t 
v(t) oc f sin?(r)dr œ t, 
0 


thus we have again a bounded oscillation for the material points on one hand, 
and an unbounded evolution on the other hand, if t > oo. 

An attempt to eliminate this inadequacy can be the decision to fix r4(t) = 
-ro and rg(t) = ro, an internal constraint, indeed. In this case, in the simple 1D 
harmonic case that we treat here, the evolution of v would result just governed 
by the forces determined by the internal spring, and it would be bounded and 
symmetric. But there would still exist an half-a-period delay between the actual 
overall shape of the system and the associated trend of v at the continuum level. 
Namely, the evolution would be still monotonic during the elongation and con- 
traction phases, obviously for the sign of the internal forces does not change (see 
panel c in Fig. 5.2). 

From now on, all results will refer to the just described approach, namely 
the choice r;(t) = r;(0), i = 1,...,m, with m the number of material points in 
the cluster, if not differently stated. However, in section 4.3 we provide also an 
example of monotonic evolution of v for the Ubiquitin protein in the case this 
assumption does not hold, i.e. r; = r;(t). 

An additional shortcoming suggested by this simple 1D analysis arises di- 
rectly from the combination of equations (5.5) and (5.6). If we suppose that 
the same evolution is performed in different times (e.g. we perform a whole 
extension-contraction cycle once at velocity vp and once at velocity vo/2), al- 
though the same gross deformation occurs, different values of v are detected. In 
fact, in the slower case the same trend of zs spreads over a time that is twice the 
time of the fastest one. Hence the integral results greater (doubled if the self- 
action evolution is linear), as pointed out in panel d of Figure 5.2. 

Finally, it is obvious that if the biatomic 1D system considered in this section 
is kept elongated or compressed, its shape does not vary but, since the spring 
is loaded, zs is different from zero and v evolves independently (data are not 
reported here). 

If the interatomic potential among material points is changed to, for exam- 
ple, a Lennard-Jones one, as the one in (3.4), we do not find an improvement in 
the picture. Actually the very stiff Lennard-Jones potential makes it almost im- 
possible to achieve compression without huge spikes in the evolution of v (see 
panel a in Fig. 5.3). Therefore, to perform such an analysis, two extra-particles 
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Figure 5.3: Two-atoms system. Panel a: Trend of v when the interaction between neigh- 
boring atoms is governed by a Lennard-Jones potential. The latter one is very stiff and the 
associated v undergoes a very large peak in correspondence with the maximum compres- 
sion. The decreasing trend is practically negligible. Panel b: Evolution of v when external 
forces, due to the interaction between the system and the fictitious atoms that lead the de- 
formation process, is considered in the computation of the self-action. Panel c: Dynam- 
ics of v when, as above, the external forces are taken into account and the square-root of 
the reciprocal of the square root of the order parameter v_!/2 is the quantity chosen to 
describe the shape evolution of the system. Here the inner interaction potential is an har- 
monic covalent bond, but a similar outcome is detected also in case a stiff Lennard-Jones 
potential is enforced. Qualitative and quantitative agreements between v and the radius 
of gyration of the 1D system are achieved. 


are bonded through an harmonic potential to the atoms constituting the system. 
These fictitious atoms are driven at constant velocity instead of directly drifting 
the two inner points. In this approach we can either include or neglect the con- 
tribution of the external springs on the forces that act upon the inner atoms of 
the systems. Such a choice appears crucial as the external harmonic potentials 
end up governing the dynamics, and the related forces combined with the inter- 
nal ones do practically change sign as soon as the motion is reversed (see panel 
b in Fig. 5.3). This is confirmed not only for Lennard-Jones potentials but also 
when the inner forces are in turn harmonic potentials (see panel c, same figure). 
In this analysis (differently from panel b) we take the square-root of the recip- 
rocal of v, in analogy with the eigenvalues of the inertia matrix and the relevant 
inertia ellipsoid. This shrewdness allow us to achieve a trend in compliance with 
the radius of gyration and to avoid the amplitude mismatch observed in panel b. 
The order of magnitude of the quantities in comparison is also similar. 

In this 1D example there is no influence of temperature on the scheme we 
are using, providing that the effect of the external springs are not accounted for 
in the force vectors determining z,. Once the latter is included to achieve a clean 
representation of the numerical results, as the one in the just mentioned panel c, 
the toy scheme seems much more sensible to thermal fluctuations and to their 
ability to spoil the picture, due to the origin of non-homogeneous deformation 
terms. However low-enough values of the temperature do not remarkably alter 
stretching results. 
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Figure 5.4: First row: Dynamics of a homogeneous deformation for a four-atoms system. 
Each material point is pulled at constant velocity in an extension/compression cycle. Sec- 
ond row of pictures: Evolution of the eigenvalues A; of the symmetric part of v obtained 
by polar decomposition. The order parameter for the shape of the system never shrinks 
below the initial state despite the set of atoms undergoes a compression phase (third col- 
umn from the left). This shortcoming is common to all the situations where external 
forces are absent (this is due to the direct assignment of constant velocity motions to the 
points of the system) or not considered in the computation of the self-action. Obviously, 
since the deformation process is symmetric, v is a circle. 


2D four-atom system. Here we briefly mention the outcome of 2D simula- 
tions performed over a set of four atoms in space, placed at the corners of a 
square. Once homogeneous deformation is applied (all atoms are suitably driven 
by constant velocity motions) the overall basic picture of panel c in Figure 5.2 is 
achieved. In this elementary scheme all adjacent atoms are bonded by harmonic 
interactions. These ones are the sole actions of the model (see Fig. 5.4, panel a). 

In this case, v reduces to an ellipse, a circle when, starting from a square 
configuration, the four atoms are driven homogeneously, maintaining the square 
symmetry. Figure 5.4 displays the evolution in space of the atoms in the square 
lattice and the corresponding coarse representation in terms of v. The compres- 
sion phase is not described appropriately because the actual value of v represents 
an ellipse never smaller than the one corresponding to the initial condition. Also, 
at the end of a cycle in the simulation, the system is in a state equivalent to the 
initial one, while v does not reach the initial value. 

Previous results on the square lattice refer to a scheme where the reference 
positions of the atoms, i.e. the r; in (5.6), are kept constant to the initial values, 
and the interacting forces fij change. However, a similar outcome is detected 
also in the case the former quantities are let free to evolve, but the two eigenvalues 
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have different evolutions in magnitude. A similar non-symmetric behavior of the 
eigenvalues of v occurs also if the number of internal constraints is changed. 
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Figure 5.5: Panel a: Picture of the four-atoms set where an internal constraint is inserted. 
Panel b: Evolution of the semi axis of the ellipse for a homogeneous deformation totally 
equivalent to the one depicted in the first row of images of Fig. 5.4. The presence of the 
extra spring alters the dynamics of v, which semi axes A1 and Àz (eigenvalues of the 
symmetric matrix U stemming from polar decomposition of v) follow different trends in 
spite of the symmetry of the motion. 


Indeed, if an extra inner spring is added, as for example the one connecting 
atoms 2 and 4 in Figure 5.5, panel a, but the constant velocity stretching protocol 
is not varied (the deformation process depicted in the upper row of panels in 
Fig. 5.4 still holds), we obviously get a different evolution of v which changes 
from a circle to an ellipse (see Fig. 5.5, panel b): the two eigenvalues of v follow, 
in fact, different dynamics. 

So, at least for systems composed by a small number of particles, the expres- 
sion of z, that we have obtained does not allow the description in terms of v to 
capture appropriately the behavior in compression. However, if we perform the 
deformation process by using fictitious points, properly connected by springs to 
the molecule (see the first panel in Fig. 5.6), and we take into account the exter- 
nal forces generated this way, the results appear, at least qualitatively, acceptable. 
Also, they are qualitatively independent of the number of internal constraints, 
as summarized in Figure 5.6. Finally, we investigate how the picture changes if 
we alter the interaction potentials. We enlarge first the number of atoms in the 
two-dimensional lattice, increasing them up to 8, and consider Lennard-Jones 
interactions for two different values of the cut-off radius (see Fig. 5.7). For a cut- 
off R, = 3.0Å only adjacent residues generate an interaction. At Re = 6.5A all 
atoms interact one another. In Figure 5.7 the system is also depicted in the max- 
imum elongation and compression states, which correspond to labels (1) and (2) 
in panel a of Fig 5.8. Only when the external actions (the ones exerted by the 
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Figure 5.6: First row: Four-atoms system and fictitious beads driving deformation. The 
ghost atoms follow constant velocity motions and drift the system deformation thanks to 
harmonic external potentials. Maximum elongation states are also reported and labeled 
by (1) and (2) also in panel a. The overall shape of the system is here summarized by the 
magnitude of the edges of the limiting rectangle, bbz and bbz. The semi axes of v should be 
in agreement with the latter ones. Panel a: Comparison between the trends of the limiting 
box and the ellipse semi axes, dy 2 (A; are the eigenvalues of the symmetric part of v by 
polar decomposition). Here external forces due to the ghost atoms and relative external 
springs are considered in the computation of zs. Instead, the inner extra spring is absent 
(neither in the system itself nor, accordingly, in the computation of the self-action). The 
qualitative agreement is considerable. Panel b: The qualitative agreement does not change 
if the inner extra spring is added to the system and to the computation of the self-action, 
differently from what have observed when external forces are not taken into account in 
the expression of the self-action (Fig.5.5b). 


fictitious material points) are included in the computation of zs, the variation in 
time of v can be deemed in agreement with the real shape evolution of the lattice 
(the square-root of the reciprocals of the eigenvalues of the symmetric matrix U 
are again considered). 


4,2. Three-dimensional atomic clusters 


Previous examples have been developed just to clarify the stage and to pave 
the way to more intricate three-dimensional simulations, which are, at the end, 
the realistic setting we have to face. Here we report the results obtained from a 
simple cubic lattice — two extra atoms are placed in the centers of upper and lower 
faces to simplify the implementation of constant velocity stretching simulations 
(the scheme is in Fig. 5.1la) - and the protein Ubiquitin (see Fig. 5.1). 
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Figure 5.7: 2D eight-atoms system. The smaller cut-off radius used in the simulation is 
depicted to show that only adjacent points are bonded in this case, similarly to covalent 
bonding. The external fictitious apparatus and the limiting rectangular box are also easily 
detectable. Maximum stretching and compression are labeled with marks (1) and (2). The 
latter ones can be found also in panel a of Fig. 5.8 in correspondence with the evolution 
of v in these states. 


Cubic lattice. Simulation data arising from the analysis of the cubic lattice 
sketched in Figure 5.11a, confirm what has been shown for the two-dimensional 
systems. 


e The overall landscape depends, of course, on internal interactions among 
mass points. Passing from short to long-range Lennard-Jones potentials, 
to covalent harmonic bonding between adjacent atoms, the time evolution 
of v changes (Fig. 5.9). The results are (in general, indeed) only partially 
in compliance with the real shape evolution of the lattice (see for example 
panel c in Fig. 5.9). 


e The inclusion of the external forces exerted by fictitious atoms, the ones 
we use to impose deformation in the numerical simulations, is a crucial 
choice that commonly improves the adherence of the computational re- 
sults to what we have in mind to describe (see Fig. 5.10). In this setting the 
square-root of the reciprocals of the eigenvalues of U appears to be the 
best way to represent the semi axes of v. 


e Thermal fluctuations do not represent a hindrance, providing that lattice 
stability is not compromised (i.e. the condensed nature of the matter is 
not altered). The remark does not depend on the possible inclusion of 
external forces (see panel b in Fig. 5.10). However, as evidenced by the 
results in the same panel (i.e. t > 16000t,,, with tu indicating the time 
unit used in the computations), since thermal fluctuations are a source of 
non homogeneous deformation and random forces, in the long run they 
cause a sort of monotonic trend of the eigenvalues A; of U which is not in 
compliance with a lattice just vibrating. 
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Figure 5.8: 2D eight-atoms system. Panel a: Comparison between the evolution of the 
bounding box edges and v semi-axes. Here the picture refers to a set where short range 
(Re = 3.5À) Lennard-Jones potentials connect adjacent atoms. The deformation is en- 
forced by external fictitious beads and springs. Such forces are not accounted for in the 
computation of zs. The evolution of semi-axes does not summarize alone the shape of the 
set of points (there is qualitative agreement only in the compression phase (2)). Panel b: 
Once forces due to external springs are considered, the overall picture improves, provid- 
ing that da ? and Àz 1/2 are taken as the semi-axes of v. Panels c and d: Similar statements 
hold true if Re is greater (6.54). If external forces are not considered, the evolution of 
v degenerates into a 1D dynamics (only one of the eigenvalues actually evolves), panel c. 
Again, if we include external forces we recover accuracy and “robustness” in the scheme 
(panel d). 
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Figure 5.9: Panel a: Comparison between the first eigenvalue of U in (5.2) and the evo- 
lution of the edge of the cubic set of atoms oriented along the z-axis. Stretching and 
compression are performed along the latter axis by means of ghost atoms and springs, see 
Fig. 5.11 for snapshots of the system during an equivalent deformation path. There is not a 
close correlation between the two quantities displayed and a high peak in the evolution of 
the eigenvalue is generated close to the compression phase, presumably as a consequence 
of the stiff Lennard-Jones potentials employed in this simulation (Re = 12A, all atoms 
interact with each other). Panel b: Trends of second and third eigenvalues of v emerging 
in the simulation just described. If i * and az” are considered, the qualitative agree- 
ment with the edges evolutions is considerable. Unfortunately this approach does not im- 
prove the picture of the first eigenvalue. Panel c: The same stretching protocol is applied 
to the system when short range Lennard-Jones interactions are considered (Re = 5.75A). 
Similarly to the 2D case, short range Lennard-Jones potentials generate an evolution of 
v which is degenerate: in this case one of the three eigenvalues does not evolve and the 
picture associated with v is two-dimensional. Also, the trends of the two eigenvalues are 
equivalent and do not allow to catch which sides of the real shape are shortening and 
which ones are stretching. Panel d: The interaction potentials are in this case harmonic 
bonds between adjacent atoms of the cubic lattice. The picture of v is correct in the com- 
pression phase, that is in the central region of the plot 8000t,, < t < 13000t,,. During the 
stretching, instead, the three eigenvalues all follow a similar increasing/decreasing trend 
and none of them allow us to distinguish between which sides of the real shape are short- 
ening and which ones are stretching. Taking the square-root of the reciprocals of all the 
eigenvalues cannot fix this inadequacy as all trends would be reversed. 
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Figure 5.10: Panel a: Once external forces due to fictitious atoms and springs that are 
responsible for system deformation are included in the computation of the self-action, 
the overall picture results qualitatively in agreement with the dynamics of the three- 
dimensional cubic set of atoms under scrutiny. The agreement appears when harmonic 
bonds or short Lennard-Jones interactions are considered, but it is not always the case 
for high values of Rc. Panel b: Temperature effects. Low-enough values of thermal os- 
cillations (they do not alter the condensed state of the 3D matter) do not affect the time 
evolution of v during the stretching phases, but generate trends that are not in agreement 
with the shape evolution of the systems when the external deformation mechanism is no 
more applied. A slight increasing trend in the values of A; and A3 for t > 15000t,, can 
be evidenced, despite no overall deformation occurs. 


Figure 5.11: Panel a: General setting of the cubic lattice. Key quantities such as edge 
lengths and fictitious atoms, used to perform stretching by assignment of constant veloc- 
ity motions along the z-axis, are explicitly depicted. Panels b and c: Snapshots comparing 
the evolution of the cubic lattice and the one of v for a system that undergoes extension 
and compression phases. These images refer to a model in which harmonic potentials 
bond adjacent atoms and external forces have been used to compute the vectors fi; in 
the expression of the self-action. The value of the material element volume e has been 
tuned to achieve comparable trends also from the quantitative point of view. 
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Ubiquitin: mechanical stretching. Ubiquitin protein is composed by 76 
residues. The consequent increment of degrees of freedom does not give rise to 
simulation results differing remarkably in the qualitative features from the ones 
obtained from the simple lattices considered so far. 

In what follows, when we report results in the figures collected in this 
section, the values of the semi-axes of v, evolving in time, are compared with 
the smallest rectangular parallelepiped including the protein. Its edges, namely 
bb, , bbz, bb3, are evaluated in the central principal frame of the molecular 
structure (a discussion on the rôle of the coordinate system is also in sec. 4.3). 
Once again, the approach does not appear to be robust enough to achieve a 
proper description of the molecular shape unless we introduce (numerically) 
the external forces exploited in the deformation process, as it happens in the toy 
schemes discussed previously. However, we have no general proof that the ròle 
of the external forces in the simulation be always the one we have recognized 
here. 
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Figure 5.12: Panel a: Ubiquitin stretching simulation where the protein is kept elongated 
at the end of detachment (t > 5000t.,). It is evident that the time evolution of v is domi- 
nated by the trend of only one eigenvalue. The evolution of v is not in agreement with the 
one of the molecule shape as A1 shows a monotonic trend when the molecule is simply 
oscillating in its linear conformation (T = 0). Panel b: Once secondary structure terms 
in the overall potential (3.6) are neglected in the computation of the self action and the 
temperature is maintained equal to zero, a horizontal plateau is detected in the evolution 
of the main semi-axis of v. The evolution of Az” ? and a ? well describe the trends of 
the x and y axes of the limiting parallelepiped and viewed in the principal central system 
of inertia of the protein (inset). 


Figure 5.12 reports a comparison between two mechanical stretching simu- 
lations, where, at the end of detachment, the protein is kept elongated. To obtain 
an appropriate qualitative description of the molecular shape, it is necessary to 
neglect the contributions of the secondary structure in the computation of the 
vectors f;; that determine zs, and to set the temperature equal to zero (see panel 
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b in Fig. 5.12). The need appears, obviously, because the elongated configuration 
is not an equilibrium one. Internal forces do not vanish - the protein oscillates 
driven by the ground state of the potential (3.6) - hence the self-action does not 
vanish and determines the evolution of v (see panel a in Fig. 5.12). Only when 
the contribute of such forces is neglected we obtain what is depicted in panel 
b of the same figure. We just notice that the results improve if we consider the 
square-root of the reciprocals of the second and third eigenvalues, differently 
from the first one, although there is no evident physical reason to justify such 
a choice. The picture holds true providing that peptide bonds are not loaded, 
otherwise the horizontal plateau in the trend of A; would be impossible. Ther- 
mal fluctuations also compromise the evolution of v in a similar way. Also, such 
a shortcoming cannot be easily eliminated by including the external forces by 
which we generate numerically the deformation path, as they would simply add 
a contribute to two of the vectors fij, which does not guarantee that z, would 
vanish. Actually there is only one eigenvalue of U that evolves appreciably and 
governs the evolution of v, despite the analysis be three-dimensional. 

In analyzing the whole protein, to obtain a clear picture of the evolution of 
the molecular shape it is not always sufficient to pull just one end of it and include 
the external force in the computation of z,. As it appears in comparing panels a 
and b in Figure 5.13, only when both ends are pulled far apart from each other 
and the two external forces - the ones due to the springs that drag the molecule 
ends - are included together in the scheme (panel b), the picture is qualitatively 
in agreement with the real evolution of the protein shape (and it seems ‘robust’ as 
indicated by the additional examples in sec. 4.3). Figure 5.14 compares the shape 
of the Ubiquitin with its coarse approximation in terms of v. The approximation 
process based on the ideas underlying Cauchy-Born rule can be optimized by 
varying the amplitude of the normalization volume e. We can choose it, in fact, 
in order to achieve comparable amplitudes between the limiting parallelepiped 
and v, with the possible need of a smoothing algorithm in post-processing. 

The effect of temperature is usually negligible, during stretching and in the 
phase immediately consequent, but it influences the dynamics of v in the long 
run, as we have already affirmed for the toy models analyzed previously. In the 
scenario described in this section, changes in the cut-off radius, frame of refer- 
ence or stretching velocity do not induce prominent qualitative alterations. 


4.3. Additional remarks 


As we have repeatedly affirmed, the continuum approach proposed here, 
with the specification of structure of zs in terms of the actions in the discrete 
model, seems to capture the dynamics of a single macromolecule in presence of 
constraints imposed by the environment. So, it can be used in representing the 
mechanical stretching ofa protein in free space. For results relevant to transloca- 
tion dynamics in terms of the continuum model proposed here, see section 4.5. 
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Figure 5.13: Panel a: Data refer to a mechanical stretching simulation in which one end 
of the molecule is pulled at constant velocity by a fictitious bead and an external spring 
and the other end is held fixed in space. Despite the external force of the spring is in- 
cluded in the computation of the self-action, its presence is not enough to achieve an 
adequate representation of the molecule shape. Panel b: Only when mechanical stretch- 
ing is performed pulling both ends with equal and opposite velocities, including both the 
two external spring forces in the self-action, the behavior in time of v is in agreement with 
the real shape evolution of the protein. It is, as usual, necessary to take the square-root of 
the reciprocals of the eigenvalues as v semi axes. 


Figure 5.14: Quantitative comparison between discrete Ubiquitin dynamics and the evo- 
lution of v when both ends are pulled by external atoms. The material element volume 
has been optimized to allow the quantitative good agreement. Temperature is set equal to 
0. Panel a: Beginning of mechanical pulling. Panel b: Maximum stretching state. Panel 
c: Conformation of the protein after maximum detachment has been achieved and the 
protein is left free to refold onto its native structure. 


The scheme should be further enriched in case we would consider a de- 
forming environment and/or the presence of a dense population of interacting 
macromolecules. In the latter case the balance of actions (5.5) should include a 
microstress, due to the interactions of each molecule with the neighboring ones. 
Moreover, v should not necessarily be associated with a single molecule, rather it 
could be a representative ‘in average’ of a number of molecules occupying a por- 
tion of space considered as material element and assigned to a point in a multi- 
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scale and multi-field continuum representation of the macromolecular popula- 
tion (this way we should go along the guidelines of the mechanics of complex 
materials [3]). 

In any case, the inadequacy in presence of molecular free motion remains, 
as it will be shown in the conclusive part of the present section. 

It is possible to overcome it by adopting a second-rank tensor v as a descrip- 
tor of the single molecule with a different meaning from the one assigned here, 
and selecting an expression for z, different from (5.6) but inspired by it. Possible 
approaches and related results shall be discussed in section 5. 


Effect of time-dependent placement vectors for Ubiquitin. Figure 5.15 dis- 
plays the trends of the semi axes of the ellipsoid associated with U when place- 
ment vectors in the explicit expression of the self action zs depend on time, 
meaning they are computed at every time step on the actual configuration of 
the protein, r; = r;(t). It is clear that the monotonic behavior, theoretically pre- 
dicted from considerations on 1D systems, still holds for greater sets of atoms. 
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Figure 5.15: Trends of the eigenvalues of U in a thermalization simulation for the Ubiq- 
uitin protein when placement vectors ri in Zs vary in time. The behavior displayed by 
the analytical results for the biatomic system are confirmed also for the Ubiquitin pro- 
tein. Data refer to an equilibration temperature T = 0.5. The molecule maintains its 
folded state: the radius of gyration undergoes just small fluctuations around a constant 
value. The semi-axes of the morphological descriptor follow a monotonic evolution not 
in accordance with the protein shape dynamics. 


Frame of Reference. A few remarks on the way we have used frames of refer- 
ence in computations are collected here. 
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Figure 5.16: Comparison between U semi-axes for stretching runs relevant to the PDB 
frame and to the central principal system (PRI) for the Ubiquitin protein. Evolutions can 
be considered equivalent, despite an opposite trend for the intermediate eigenvalue that, 
however, results nugatory for the final shape of the morphological descriptor. Data refer 
to a simulation where both terminuses are dragged with opposite velocities, T = 0, and 
external fictitious forces are included in the computation of zs. 


When we consider just the reference values of the placement vectors r;, un- 
der rotations R € SO (3) of the ambient space (where we evaluate the vectors 
fiz) alone, zs becomes Rzs because the generic addendum fi; ® r; defining it 
changes into R fij ®r;, for r; is fixed once and for all in the reference space that is 
different from the ambient one and isomorphic to it, the isomorphism being sim- 
ply the identification. In principle, we could also consider rotations R’ € SO (3) 
in the reference space so that zs would change into Rz,R™ with the possibility 
of having Rz,R? when R’ = R. Tensor v follows analogous rules, being the 
integral in time of negative z,, according to the evolution equation Ù = — 24. 

In contrast, when we take r; in the current configuration, as in the previ- 
ous example, under rotations R of the ambient space zs is mapped into Rzs RT, 
and y, evaluated this time in the current shape of the molecule, undergoes an 
analogous change. 

Consequently, when we compute v in one frame and want to know it in 
another one rotating in time, we have to know the whole time-history of the 
rotation up to the instant considered. 

Here we have considered the frame of reference associated to the protein 
data bank file (PDB), which is the one where protein coordinates are provided 
and can be selected once for all at will in space. However, if we develop com- 
putations with respect to the principal inertial frame (PRI) of the molecule, the 
results do not alter the picture that we have shown in previous sections. Fig- 
ure 5.16 shows that the overall shape of the molecular descriptor is equivalent 
in both cases, in the present example, if quantities are computed with respect to 
the protein data bank reference or to the central principal frame, despite an op- 
posite trend in az” a Broadly speaking, on average, differences in between the 
two cases can be more remarkable than what reported here, nevertheless a final 
argument to prefer the central principal system to the PDB one is not apparent 
from our analyses. 
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Figure 5.17: Trends of the square root of the reciprocal of the eigenvalues of U in a ther- 
malization simulation of the Ubiquitin protein. Re = 3.04 and Re = 6.5A. The first 
three pictures show that the evolutions are practically the same. However, the protein 
with Re = 3.0A is unfolding, according to the comparison depicted in the last panel, 
where the trends of the radii of gyration of the two molecules is showed. The approach 
followed here does not allow to correctly detect thermal unfolding, a situation dominated 
by inhomogeneous deformation and random fluctuations that do not contribute to the 
overall shape evolution. 


Thermalization simulations. For each cut-off radius considered in previous 
analyses we have performed thermalization simulations at T = 0.500 (in the 
already mentioned code units) to measure the evolution of the molecular de- 
scriptor associated with the protein shape. The protein is free to fluctuate in the 
ambient space, following the Langevin dynamics already mentioned, with no 
driving forces nor additional constraints. Each simulation starts from the crys- 
tallized structure of the protein according to the PDB file. Unfolding is then 
allowed, providing that the value of the cut-off radius be small enough. Protein 
configurations are also regularly sampled to obtain statistical independent ini- 
tial states for mechanical stretching simulations. As it was expected, changes in 
the molecular shape, due to thermal equilibration, are not properly detected by 
the approach outlined in this work, since the deformation process is inhomoge- 
neous and not remarkable (the radius of gyration of a thermally unfolded protein 
is only about two times greater than the one own of a compact structure). In ad- 
dition, single-residue fluctuations play a prominent rôle at the atomic level, with- 
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out a corresponding macroscopic deformation. At the given equilibration tem- 
perature, structures with cut-off radius equal to 3.04 unfold, while if Re = 6.54 
they do not. Figure 5.17 depicts these remarks for the Ubiquitin protein. The 
trends of the semi-axes of U are similar and mostly indistinguishable, despite a 
protein undergoes thermal unfolding (as it results from the trends in the radius 
of gyration). A similar behavior is found in all the equilibration simulations that 
we have performed. 
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Figure 5.18: Panel a: Trends of the reciprocals of the square root of the eigenvalues of 
U ina stretching simulation of Ubiquitin compared with the dynamics of the smallest 
rectangular parallelepiped containing the molecule. The eigenvalues are here depicted 
as sequences of points, the values of the edge of the parallelepiped as continuous lines. 

e = 3.0A, T = 0, vstr = 0.05. Data refer to a simulation performed with respect to the 
principal central frame. Panel b: Analogous data than in panel a but T = 0.2. Molecular 
descriptor data are computed with respect to the PDB system of coordinates. 


Additional results for stretching simulations. We present here some addi- 
tional data which can indicate the statistical robustness of the results provided 
above, concerning the free-space stretching of the protein Ubiquitin when both 
ends are pulled at opposite constant velocities and external forces pertinent to 
the fictitious atoms are included in the computation of zs, see Figures 5.18-5.20. 
The material element volume e has not been optimized for quantitative compar- 
isons for the data below, as here we are just interested in furnishing a qualitative 
picture. Specifically, we have performed ten independent simulations for each 
cut-off radius (R<), temperature (T), and stretching velocity (Ustr) implemented 
(Re = 3.04 or Re = 6.54, T = O or T = 0.2, vetr = 0.05 or Vstr = 0.1). For 
all the cases, results can be deemed qualitatively equivalent to the few additional 
examples here provided. 
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Figure 5.19: Panel a: Trends of the reciprocals of the square root of the eigenvalues of 
U ina stretching simulation of Ubiquitin compared with the dynamics of the smallest 
rectangular parallelepiped containing the molecule. The eigenvalues are here depicted 
as sequences of points, the values of the edge of the parallelepiped as continuous lines. 
Re = 6.5A, T = 0, vst» = 0.05. Data refer to a simulation performed with respect to the 
principal central frame. Panel b: Analogous data than in panel a but T = 0.2. Molecular 
descriptor data are computed with respect to the PDB system of coordinates. 
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Figure 5.20: Panel a: Trends of the average over 10 runs of the reciprocals of the square 
root of the eigenvalues of U in stretching simulations of Ubiquitin (data relevant to the 
central principal frame). These quantities are compared with the average dynamics of 
the bounding box. The eigenvalues are here depicted as sequences of points, the values 
of the edges of the parallelepiped as continuous lines. Re = 3.0A, T = 0.2, Ustr = 0.1. 
Panel b: Analogous average data than in panel a but now Re = 6.54, T = 0.0, vstr = 
0.1. Molecular descriptor data are here computed with respect to the PDB system of 
coordinates. 


4.4. Protein descriptor in MBP stretching simulations 


As already mentioned for the Ubiquitin protein, placement vectors r; refer 
to reference (equilibrated) configuration and do not depend on time. Here re- 
sults relevant to the first stretching protocol are presented. In fact for the MBP, in 
contrast to what shown previously, we find it sufficient to pull just one end of the 
molecule (keeping the other terminus blocked in space) to gain an appropriate 
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picture of protein shape evolution. As usual, the external force generated by the 
ghost atom that drives the deformation pathway is included in the computation 
of the self-action and the square root of the reciprocals of the eigenvalues of U 
are again depicted as the semi axes of an ellipsoid. 

In what reported in Figure 5.21 (N-terminus pulling, Re = 7.5, T = 0.75, 
Ustr = 0.1/t,), the material element volume |e| has been optimized for quan- 
titative comparisons. In this case raw data are worked out to gain the trends 
and images from panel a2 to panel f. First, original evolutions are smoothed by 
supposing a process with memory. Namely a linear-weighted average over 1000 
previous values is performed for each data. This way trends can be softened and, 
especially, a ? (the one that practically governs the description) is closer to bb; 
evolution, also in the equilibration part. Panels a2, b2 and c2 report the outcome 
of the application of this algorithm to the a” * trends. Finally, to achieve the 
ellipsoid snapshots only (panels d, e and f) a magnification coefficient equal to 4 
is used to recover a high-enough value of the first peak, as it can be noted that it 
is considerably reduced by the smoothing. No magnifications or reductions are 
used for Àz 1/2 and ? asthe average trend of those is deemed appropriate. To 
get the pictures, the center of the ellipsoid is conveniently attached to the center 
of mass of the molecule and its orientation is governed by the eigenvectors of 
U. Thanks to these expedients, the description appears appropiate, despite the 
evolution of the medium and smaller axes may be deeper. 

We add that in the present approach the equivalent radius of the sphere hav- 
ing the same volume of the ellipsoid follows a trend dominated by the first eigen- 
value, that is a trend qualitatively equivalent to the radius of gyration one (the 
volume increases in a stretching simulation). 


4.5. MBP morphological descriptor translocation results 


In translocation simulations we find that the presence of the pore, which 
constitutes a constraint that limits accessibility in phase space, enhances the 
quality of the protein shape description. To the sake of clarity, in the present 
approach, both the pulling force and the actions, due to the channel potential, 
are included in the computation of the z, as external forces. The confinement 
acts similarly to a decrement of the temperature, limiting residue fluctuations 
inside the pore. Obviously, to achieve a proper outcome, either the volume of 
the material element |e| needs to be small or the pore long enough to provoke a 
considerable stretching, detectable by the eigenvalues of U. The issue is well 
summarized in Figure 5.22, where results related to two different pore lengths 
are depicted, holding the material element volume constant at a standard value 
(equivalent sphere radius: 30A). Thus, once again, main limitations are due 
to the necessity to fine-tune such a volume on the basis of the simulation 
performed, or rather, according to a sort of ‘maximum value of deformation. 
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Figure 5.21: Raw and elaborated data are shown for a stretching simulation of Maltose 
Binding protein. As evident from the first row, to gain a better agreement between peak 
magnitudes, |e| should be further reduced, but this would generate instabilities in the 
routines used for polar decomposition. When depicted, protein dimensions are normal- 
ized by the radius of gyration own of the PDB crystallized structure. Panel al: Trend of 
the original AY ? as resulting from fine tuning of |e|. The peak is high, reaching a value 
eleventh times greater than the initial one. Its evolution is also tight and sharply decreases 
during thermalization. Panel b1: Despite the reduction in the value of |e| the drop at max- 
imum detachment does not overcome 10% of the initial value and is equal to 20% at the 
end of the simulation: the dull az” 2 dynamics is a limitation of the model. Panel c1: The 
minimum value at 10000¢,, corresponds to a drop of 30% of the initial value and has 
to be compared to a 70% reduction in bb3, so the agreement is not very close in mag- 


nitude. Panel a2: 1° 2 trend resulting from a weighted running average. Specifically, 
the weights are linear (from 0 to 1) and spread over the current value (weight = 1) back 
to the one-thousandth previous one (weight = 0). The approach allows improving the 
agreement between Pia * and bbi, especially in the thermalization part. Panel b2 and c2: 
Same approach just described applied to is ? and Ag 2. A similar smoother behavior is 
detected and amplitudes are reduced. Panel d: Snapshot of the ellipsoid and molecule at 
the beginning of the pulling. The protein is compact and the ellipsoid is close to a sphere. 
Here and in panels e and fa magnification coefficient is applied to the ellipsoid semi axis 
to recover the amplitude of the peak smoothed by the averaging process. Panel e: Ellip- 
soid and protein near the maximum value of the stretching. A remarkable elongation is 
detected in agreement with the almost linear fashion of the polymer. Panel f: Snapshot 
taken at the end of equilibration. The morphological ellipsoid shrinks still too quickly 
from its maximum elongation when compared to its shape in panel d. 
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Panel a, b and c underline the difference in the amplitudes of the square root 
of the reciprocals of the eigenvalues once the channel is elongated from 100A to 
600A. Panels d through f compare the MBP shape and the ellipsoid associated 
with U once projected onto the plane formed by eigenvectors 1 and 3 (600A chan- 
nel). The choice of the plane is not by chance, as again we observe that eigen- 
values 1 and 3 govern the dynamics, while the second one experiences just lit- 
tle fluctuations. The outlined approach confirms its appropriateness to summa- 
rize shape evolutions for a wide range of simulation parameters in constrained 
dynamics. We performed several transport simulations across a nanopore by 
varying the cut-off radius, pore length, temperature, driving force (always con- 
siderably great, to perform translocation in a limited time window) and pulling 
terminus, without remarkable limitations in the description (providing that the 
transport be homogeneous in time). 

However, the model is sensitive to the value of the pulling force: a greater 
value causes higher values of interatomic forces fij, which are detected by Zs. 
Hence, two translocation simulations, which are equivalent from a macroscopic 
deformation point of view (same channel), just performed at different driving 
forces, are described by different v. The one relevant to the greater value of the 
pulling force generates the ellipsoid that stretches the most, whereas instead the 
gross deformation of the proteins is practically the same. A similar behavior is 
also noticed in stretching runs when detachment velocity is increased. 


5. Additional continuum formulations 


As we have seen, the proposed approach based on Cauchy-Born rule and 
power equivalence, appears to be robust enough only when external forces are in- 
cluded in the computation of z,. Also, the dynamics has to be, in a sense, driven, 
constrained, by the external environment. However, additional preliminary re- 
sults (not shown here since they are deemed unreliable as a proper statistics of 
event has not been collected yet) seem to show that external forces might not be 
essential for MBP stretching and translocation simulations. In other words the 
effect of the number of degrees of freedom of the system might require further 
investigations. 

What appears evident are the limitations of the model to capture thermal un- 
folding. In thermal simulations the homogeneity is lost due to random single- 
bead oscillations that overwhelm and hide the overall unfolding process. The 
drawback can be overcome by selecting different descriptors of the protein evo- 
lution and related modified expressions of zs, which arise from evaluations of 
the interaction mechanisms among material points in the molecule. 
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Figure 5.22: Translocation simulations for Re = 3.0, C-terminus pulling, T = 0.75, F = 
2.0. Panels a, b and c: Trends of the reciprocals of the eigenvalues for two values of the 
pore length (100 and 600A). Despite a lower overall deformation if compared with the 
stretching simulations, the presence of the pore allows a better description of the molecule 
shape evolution. Especially pra 2 (panel c) appears to be particularly sensitive to the 
confinement, its decreasing trend is considerable, reaching a minimum equal to about 
one half of the initial value, regardless the pore length. i ? is instead influenced by the 
latter quantity (panel a), being negligible in the 100A pore. Finally, the evolution of the 
second eigenvalue, although again pretty limited, is remarkable and usually greater than 
in the steered-dynamics case. Panels d, e and f: Direct comparison between the ellipsoid 
projected onto the planes associated with eigenvectors 1 and 3 and the molecule seen in the 
xz plane of the PDB system of coordinates. The quality of the description is considerable 
regardless the frame of reference employed. Due to the simulation protocol (alignment 
of the PBD x-axis with the pore axis and therefore with the pulling force direction), all 
the frames are very close to each other. 


5.1. Varying the meaning of v 


We change here the meaning of v and consider it as a work tensor density 
performed ‘in average’ in deforming the whole protein. Also here we assume 
det(v) > 0. This way ù is a power and it is supposed to be equal to the power 
density in the discrete model. The equation v = —z, is now a tensor power bal- 
ance. The structure of zs, here indicated by 240, is that of a tensor power density. 
We express (heuristically) 252 as the sum of the tensorial product between the 
absolute value of the total force | f;;| exerted between residues i and j and the 
time rate of change of the absolute value of the placement vector of the material 
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points with respect to the center of mass of the molecule, d;|r;|. It is important 
to point out that here | - | indicates the absolute value of the components of the 
considered quantity, that is |x| is a vector which entries are the absolute values of 
the entries of vector x, it is not its norm, which in turn would be indicated by the 
symbol ||x||. In a sense the dynamics here is like projected onto the first octant of 
the Cartesian space R3, through proper reflections (associated with the absolute 
value operator) that allow us to distinguish between departure or approach of a 
residue with respect to the center of mass, regardless their mutual placement in 
the frame of reference selected to assign point-coordinates. We write 


= > ke 


le| isjee 


252 = fi|® Ari]. (5.7) 


In the above expression k;*? is a dimensionless scalar quantity defined as the 
ratio between the actual distance of the point of application of the force from the 
center of mass of the molecule, |; |, and the radius of gyration of the crystallized 
PDB structure of the protein, ae 
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The action f;; involves all the elementary terms present in the G6-like model, 
such as actions stemming from peptide bonds, bending deformation and so on, 
equivalently to the previous approach. As evident from (5.7), not only the square 
root of the reciprocal of the eigenvalues of the symmetric matrix due to polar de- 
composition of v are analyzed, but also the diagonal entries of v are investigated 
and a suited ellipsoid built from them. 


Thermalization simulations results. For a matter of brevity, we show only 
thermalization simulation results. In fact only in this case the first approach con- 
sidered in the present chapter is not appropriate. However, similar outcomes (in 
terms of the quality of the overall picture) hold also for stretching and translo- 
cation simulations. Nevertheless, the description of thermal unfolding is the 
toughest task. The approach analyzed here is mainly tailored to follow protein 
deformation when diagonal components of v are considered. However, also 
symmetric matrix eigenvalues provide an adequate geometrical description of 
the molecule. 

In Figure 5.23 we show the trends of v diagonal entries for a molecule with 
cut-off radius equal to 7.5A, while undergoes equilibration. As already shown 
(Fig. 3.5), since the simulation temperature is 0.75, MBP remains compact. Also, 
that behavior can be easily detected by the lack of increment in the bounding box 
edge lengths, as they simply oscillate around constant values regardless the frame 
of reference used (all panels of Fig. 5.23). In panels a, b and c we compare the 
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trends of the square root of the reciprocals of the eigenvalues of U with the evolu- 
tion of the sides of the limiting rectangular parallelepiped in the central principal 
frame, despite the simulation runs with respect to the PDB system of coordinate. 
We do this because we believe that the description by means of symmetric matrix 
eigenvalues should be more general than the one based on the diagonal entries 
of v. All the panels show a good agreement between the two trends reported in 
each of them. The eigenvalues do not show the almost monotonic trends as the 
ones in Figure 5.17. Rather, they randomly fluctuate around the initial values, a 
behavior in accord with the bounding box one. However, fast spikes are not re- 
produced and sometimes a sort of opposite behavior is detected. Panels d, e and 
f report the evolution of v diagonal entries and bounding box edges in the PDB 
frame of reference (bbz, bby, bbz). v diagonal entries are indeed tied to the sys- 
tem of coordinates where quantities are evaluated. It is evident how much these 
trends faithfully reproduce the evolution of the molecule shape even in such a 
random dynamics. The magnitude of the oscillations can be easily changed by 
altering the value of the material element volume once the approach here ana- 
lyzed is chosen. Fast spikes are closely matched. Such an agreement holds true 
for all the simulations we have performed, regardless the value of temperature 
and cut-off radius. Therefore the associated ellipsoid always describes the pro- 
tein shape in a pointwise fashion, not only in average. Unfortunately, neither the 
eigenvectors of v nor of U, (not even the rotation matrix arising from polar de- 
composition) provide a straightforward way to understand the orientation of the 
polymer. Also, in this particular case, since the molecule remains folded, v diag- 
onal components are really able to follow the details of the deformation pathway 
(bounding box oscillations are only a few Angstroms), which is not macroscopic. 
To understand what just stated, see Figure 5.24. It clearly shows the ability of the 
current approach to distinguish between a protein that remains compact in a 
heat reservoir and one that instead unfolds. Indeed, panels a through c compare 
three quantities: square root of the reciprocals of the eigenvalues of U for a 3.0A 
cut-off radius molecule, the relevant evolution of the bounding box sides and 
the square root of the reciprocals of the eigenvalues of U for a protein that re- 
mains folded. First of all we underline that the agreement between eigenvalues 
and associated edges of the limiting rectangular parallelepiped is appropriate on 
an average sense, since bounding box spikes are not so closely followed by ellip- 
soid semi-axes. However, the magnitude of the deformation is clearly detected: 
fluctuations in the evolution of a” * associated with the folded molecule are 
hardly visible. Such quantities remain practically flat and very close to the initial 
value in a macroscopic standard, that is the one determined by the fluctuations 
belonging to the 3.0A cut-off radius MBP. It is evident that we can recognize 
that the latter one is undergoing big deformations (thermal unfolding), while 
the former one practically remains in a native-like conformation. Also, now it is 
clear what we meant about the ability of the selected approach to follow the little 
details of the deformation process, even the smallest changes in the shape are 
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Figure 5.23: Thermalization simulation results for MBP. Re = 7.5, T = 0.75. The protein 
remains folded with these parameters. Panels a, b and c: Comparison between the bound- 
ing box side lengths and the square root of the reciprocals of the eigenvalues of U. The 
former are computed in the principal central frame, the latter are relevant to the polar de- 
composition of v evaluated in the PDB coordinate system. In panel a the highest-valued 
eigenvalue trend is shown, in panel b the middle-valued one and, accordingly, in panel c 
the lowest. All three show a remarkable agreement with the trends of the bounding box 
edges, especially considering that are relevant to different frames, in compliance with the 
need that the eigenvalues of the symmetric matrix should have a general meaning. Panels 
d, e and f. Comparison of v diagonal components and bounding box sides in the PDB 
frame. The agreement is in all cases at pointwise level, spikes are correctly reproduced, 
despite the gross deformation of the molecule is negligible as no thermal unfolding takes 
place. 


recorded and are scaled correctly, providing that the material element volume of 
the compared evolutions be the same. 

The same picture is confirmed in panels d, e and f of Figure 5.24 where diag- 
onal components are considered. Also in this case unfolding is clearly detected. 
Moreover, the agreement between the ellipsoid semi axes and the bounding box 
sides appears adequate not only on average but pointwise. If the ellipsoids are 
plotted, the difference in their dimensions is evident and we can detect which 
molecule is unfolding, as shown in Figure 5.25. Data used for this figure are the 
ones also reported in Figure 5.24, when the difference between the first eigen- 
values is close to the maximum one. 


5.2. zs model 3 and 3* 


Explicit formulation. We can also choose different expressions of zs as explicit 
constitutive structure. Two of them are discussed here. The first one, indicated 
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Figure 5.24: Thermalization simulation results for MBP. Re = 3.0À, and Re = 7.5A, 
T = 0.75. The molecule with the lower Re unfolds, the other one remains compact. The 
material element volume is the same for the two cases, equal to a sphere of radius 20A. 
Panels a, b and c: Comparison between the bounding box side lengths (solid dark line) 
and the reciprocals of the eigenvalues of the 3.0A Re molecule (dotted line). Also the 
reciprocals of the eigenvalues of the 7.54 Re protein are showed for a quantitative com- 
parison (dashed line). The agreement between the 3.0A case and relevant bounding box 
sides is appropriate in all panels. Also, the current approach is able to distinguish between 
a molecule that undergoes thermal unfolding and one that does not, accordingly to the 
difference in az” 2 magnitude. Panels d, e and f: Comparison of v diagonal elements and 
limiting parallelepiped edges in the PDB frame of reference for the Re = 3.0A simula- 
tion. The agreement is appropriate in all cases and even fast spikes are closely followed. 


by Zs3, reads 


1 z 
SAT" q DO hig (fag Lg) ®dilril. (5.8) 


isjee 
Here l;; is a dimensionless vector defined as 


Tij Tij 


reg lJ iral 


ley = Tij š | (5.9) 


where r;; is the position vector of material point j with respect to i (note that 
Tij = —Tyji): kiš? is a dimensionless quantity defined by 


0 
k7 = [ral - Iris Jr; I (5.10) 
t 
7 Ileal- irill roir 
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Figure 5.25: Comparison of the ellipsoids (and associated proteins) for the two cases al- 
ready compared in panels a, b and c of Figure 5.24. It is apparent which is the molecule 
unfolding just from the difference in the shape of the ellipsoids, which refer to the square 
root of the reciprocals of the eigenvalues of U. Snapshots are taken when the difference 
between the first eigenvalues is close to the maximum one, according to Figure 5.24a. 
The palette indicates the z-axis values of the upper half portion of the ellipsoids. Protein 
dimensions are normalized by Tia 


where, ri is the placement vector of residue j referred to residue i in the refer- 
ence configuration of the molecule and other symbols have been already defined. 
Finally ¢ is a vector the entries of which are all unity. 

In this approach the first quantity involved in the tensor product is an effec- 
tive measure of the local action between amino acids 7 and j, while the second 
quantity, once summed upon all points and integrated in time, is a gross measure 
of the evolution of the distances within the molecule thought as a whole. Since 
the actions practically result from the projection of the forces exchanged between 
amino acids è and j onto the vector joining them, only peptide interactions and 
native contacts (Lennard-Jones potentials) are considered in the computation of 
3. The essential idea behind this approach is that d;|r;| governs the evolution of 
zs3. Therefore the whole left side term of the tensorial product, i.e. ki (Fasti) 
is tailored in a way that it has a constant sign when the forces are generated by 
harmonic or harmonic-like potentials, as peptide bonds or Lennard-Jones in- 
teractions, respectively. Here ‘harmonic-like potentials’ means potentials that 
are attractive in a region of the phase space and repulsive in the complementary 
one. For a matter of clarity we shall refer to this approach as SAM3 in the present 
section. 

An alternative formulation that stems from this model and actually reduces 
the whole evolution of v to the evolution of its first diagonal entry, is subse- 
quently described and will be addressed to as SAM3*. This approach allows one 
to drop the absolute value of the quantity r; in the time partial derivative of (5.8), 
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without losing the ability to discerning between detachment or approach of an 
amino-acid with respect to the center of mass. The aim is accomplished by simply 
evaluating the vector r; in a local frame of reference that has its x-axis coincident 
with the vector itself, namely each residue has a local proper frame of reference 
where its placement vector (with respect to the center of mass) is evaluated. In 
the specific case, the placement vector itself fixes the x-axis of such a coordinate 
system. Thence only the first entry of r; is different from zero and so only the 
first diagonal component of 243 evolves. In practice, the order parameter is re- 
duced from a second-rank tensor to a scalar. However, tensor rank can be easily 
restored claiming a volume-preserving deformation: in this way the other two 
diagonal entries of v can be deduced by requiring that the ellipsoid associated 
with them would have a constant volume, equal to the initial one (the volume 
of the material element). Since there is no preferred assumption, in the volume- 
preserving enforcement the two indeterminate axes (i.e. the ones that are not 
governed by the self-action) are set equal to each other. Similarly to the second 
approach introduced in section 5.1, both the square root of the reciprocals of the 
eigenvalues and v diagonal entries are considered for SAM3, while for SAM3* 
only the latter ones are evaluated. 


Main results. In this section all results for SAM3 and SAM3* are briefly out- 
lined. Generally speaking, the formulations analyzed here require the compu- 
tation of several additional quantities or extra frames for every time-step dur- 
ing a simulation, resulting more time-consuming than the two previous ones. 
For this reason, no translocation simulations have been performed. Moreover, 
the already presented outcomes are deemed appropriate for the purpose of the 
present investigation. However, a few results are pointed out as the quality of the 
description for both SAM3 and SAM3* is adequate. As far as (5.8) is concerned, 
the diagonal components of v provide a clear description of the MBP shape in 
all the runs performed, as it was expected. In this case, the value of e can be very 
small and so v entries very large. As it results from Figure. 5.26, (1, 1) leads the 
dynamics and closely follows the trend of the associated bounding box edge in 
the central principal frame for an AFM-like stretching simulation. The evolution 
of the remaining two entries is negligible, although panel b and c highlight the 
fact that their trends are not so close to the edges of the limiting parallelepiped. 
Usually the appropriateness of the picture improves in the PDB system of coor- 
dinates. 

We find that only two of the eigenvalues of U evolve, as one of them is con- 
stantly equal to 1 regardless the involved dynamics. Hence the picture associated 
with the AY ? is just two-dimensional, as show in Figure 5.26d and relative in- 
set. This 2D description is in a sense in accord with the evolution of the gross 
shape of the molecule (as basically one eigenvalue forms a peak and the other 
one a minimum), but was not expected. 
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As already stated, it is in the PDB frame of reference that SAM3 works better. 
As it is depicted in Figure 5.27, v diagonal entries closely follow the evolution of 
the bounding box edges during a thermalization simulation. The agreement is 
considerable and the curves reported in both panels a and b are hardly distin- 
guishable (this holds true also for the diagonal component not depicted). Quan- 
titative agreement is just a matter of fine tuning the material element volume. 

SAM3* approach generates a 1D evolution of the self-action and therefore 
also of the associated v. However, the 3D evolution can be recovered artificially 
claiming a volume-preserving transformation for the ellipsoid, as already men- 
tioned. In such a way it is probably obtained the most appropriate description 
of all the ones presented in the thesis, as the constant-volume assumption allows 
both Az” * and a3” ? to considerably decrease from the initial value. The el- 
lipsoid picture is drafted in Figure 5.28 for a stretching simulation, considering 
both the trends of the square root of the reciprocals of the eigenvalues (upper 
picture) and a snapshot of the system close to the maximum elongation of the 
molecule (panel b). From the latter panel it is immediately appreciable the qual- 
ity of the description as the almost single file conformation of the MBP gives rise 
to an ellipsoid that is an extremely thin and long spindle. 


5.3. Remarks 


Three additional formulation of z, have been considered to overcome the 
limitations of the basic approach. Among them, the first one (sec. 5.1) shows 
the best premises, as it is computationally inexpensive and allows a proper de- 
scription of the molecule shape in all the simulations performed. SAM3 and 
SAM3* approaches require the computation of several additional quantities at 
every time step, considerably slowing down the efficiency of the numerical code. 
Also, the symmetric matrix for SAM3 results two-dimensional, regardless the 
simulated dynamics. SAM3*, when associated to a volume-preserving defor- 
mation requirement, generates an evolution of the ellipsoid in close agreement 
with the real deformation of MBP. However, no translocation simulations have 
been performed for SAM3 and SAM3*. Also, systems smaller than the MBP 
have not been studied in this frameworks. However, there are no doubts that 
the appropriateness of the approaches here analyzed will not be compromised 
by the dimensionality of the system, since the associated z, formulations arise 
from considerations on deformation mechanisms between two single material 
points. 

What is necessary to further develop are the physical consequences of se- 
lecting empirical formulations for zs, as the description of the material element 
shape is only the first step to be undertaken in the building up of a continuum 
model for a complex body. 
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Figure 5.26: Mechanical stretching simulation for MBP. Panel a: Comparison between 


v(1,1) and the bounding box x-axis edge in the central principal frame, bb; . 


v entry 


ends up leading the whole dynamics of the ellipsoid and it shows a trend totally in com- 
pliance with the gross shape evolution of the MBP. Panel b and c: v(2,2) and 1(3,3) 
components compared with the associated edges. The trends are not so in agreement as 
in panel a, but it is appropriate to point out that amplitudes are both negligible if com- 
pared with v(1,1). Panel d: This panel and the relative inset shows the trends of the 
two evolving symmetric matrix eigenvalues. The third eigenvalue (not shown) is always 
constant, independently from the performed simulation. Hence the ellipsoid evolution 
associated to U is only 2D for SAM3. However A correctly reproduces the elongation 
while a3” > depicts the shrinking of the shape. We note that symmetric matrix eigen- 
values undergo oscillations that are considerably smaller than the ones of the diagonal 
entries of v. 
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Figure 5.27: Panel a: Here the trends of (1,1) and the x-axis edge of the bounding box 
in the PDB frame are compared. Trends are so close to each other that a visual distinction 
is tough. It is in the PDB frame of reference that SAM3 achieves the best description of 
the MBP shape as showed in this case for the challenging thermalization case. Panel b: 
Similarly to panel a, (3, 3) and bb, are compared, pointing out again the close similarity. 
v(2,2) and bby trends are not shown but their agreement is comparable with the ones 
reported in this figure. 
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Figure 5.28: Panel a: Trends of the square root of the reciprocals of the eigenvalues as- 
sociated to SAM3* for a mechanical stretching simulation of MBP. x ? and az” are 
obtained claiming a constant volume deformation for the ellipsoid and are set equal to 
each other. The trends result in close agreement with the overall shape deformation of 
the protein in a AFM-like stretching simulation. Panel b: Visual comparison between 
the almost linear shape of the MBP at maximum elongation and the associated ellipsoid, 
which has been obtained from the data of panel a at maximum stretching multiplied by 
the radius of gyration of the native MBP (also, a magnification coefficient of 2 for the semi 
axes has been necessary, although a fine tuning of e would have worked the same). The 
affinity of the two shapes is here substantial as the constant volume assumption allows to 
achieve an adequate shrinking of both the transverse axes of the ellipsoid. 
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